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1.0 INTRODUCTION 


This document is provided in response to the requirement stated in 
Division 240 of Section 4„0: Statement of Work, Request for Proposal, of 

the Multiresource Inventory Methods and Pilot Test, Phase I, South Carolina 
Design and Implementation Planning Contract. 

It supplements an earlier version of the component evaluation report. 

The preliminary version {Appendix A) contains a complete list of upper and 
lower level Geographic Information System (GIS) components. 

The current version is of a different character; it contains a description 
' of another step in the evolution of the MAIS system. The reason for the change 

is stated in the following. 

> 1.1 Approach 

The old adage: "The whole is more than the sum of the parts", is espe- 

cially applicable to the evaluation of the prospective MAIS system. The assur- 
ance that each component is a state of the art component in good working 
condition does not imply that the same is true for the overall system. 
Conversely, a smoothly functioning system must necessarily consist of good 

> working components. 

An evaluation of single components is therefore not sufficient. One 
must also be concerned with component interactions: their method of use and 

the total effect on the entire system. 

In the initial phases of the MAIS project, it was thought that a system 
could be assembled from existing components. In the course of the work, 

* however, it became apparent that along with well -entrenched methods and 

programs, some new concepts in software would be required to make the system 
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function. As a result, the main concern with the evaluation of the proposed 
system became focused on the simple question: Will the basic concept work? 

To answer this question, the Phase IA tests described in this document 
were undertaken. It became clear that a positive answer was required 
before one could proceed with a more thorough, component-by-component evalu- 
ation to secure an optimum design. Also, it seemed wise to obtain this 
answer b- ore proceeding to the related question: How well do the concepts 

work, as planned for Phase II? 

This document, therefore, emphasizes actual testing of a "prototype" 
system; it does not consider the background of various components with 
regard to such matters as: state of the art maturity, availability, 

working status, and R & D requirements. The testing approach confronts one 
with the real problems which arise when all components must be exercised 
to provide a meaningful systems result, and thus concentrates the evaluation 
effort on those components which are currently the weakest links in the design. 

The MAIS system as proposed in the MAIS concept development document basi- 
cally consists of three major subsystems: the upper level GIS, the lower level 

GIS, and the "linear model", which relates data in the two GIS systems. In 
the course of the Phase IA work, it was discovered that the "linear model" 
was the most underrated part of these three subsystems. In this model the 
data from the GIS systems flow together and are combined to yield the desired 
estimates. It seemed that almost the entire "method of use" of the two GIS 
systems with their established techniques took on critical importance in 
this component. New methods and techniques not previously applied in this 
context were proposed to bring this about. 


As a consequence, much of the effort for the Phase IA test was expended 
on this subsystem, and much of this report is devoted to the further definition 
of the methods and techniques formulated for its programs. The term "linear 
model" was replaced with "estimation subsystem" to more accurately reflect its 
status and complexities in the overall MAIS system. 

1.2 Objectives 

The main objective was to process a limited set of data through a loosely 
assembled "prototype system". The first concern within this overall objective 
was to secure a proper data flow; that is, when data are entered at one end, 
results will emerge from the other. 

The second concern was to assess the data resulting from the data flow to 
ascertain that meaningful results can be obtained. Mindful of the GIGO 
concept, this concern seems perhaps more important than the first. However, 
a functioning system must necessarily exist before good results can be produced. 

An important aspect of the Phase IA test was therefore to assemble the 
prototype system from existing components and to construct preliminary versions 
of those elements which had to be newly created. It is hoped that the same 
components can be used by the Phase II contractor to process data for the 16- 
county test area in South Carolina, 

1.3 Scope 

The scope of the Phase IA test was necessarily limited by several factors. 
As the main objective of this test was to assure the workability of the 
proposed process, the question of the quality of the estimates could not be 
fully answered. The time frame did not allow for a full research effort, and 


the area of interest may be too small to obtain an exhaustive answer. 

Therefore s a great deal of the Phase II effort also must be concentrated 
in this area. 

Other limiting factors also played a role during the Phase IA test. 
Several resource parameters could not be completely processed through the 
system because of a lack of time for basic data input. The time factor 
was also a constraint in the analysis of those parameters for which a 
final data set was obtained. The overall philosophy for the Phase I A 
effort was to do a limited number of tasks for a limited area, but to 
try to complete these tasks as well as possible. 

1 .4 Prototype System 

The meaning of the word prototype as used in this report must not be mis*" 
understood; an integrated software package running on a single machine was 
not created. Rather, several methods, packages, machines and people at 
various locations were involved. Some components were tightly integrated 
packages; others were separate existing programs. Several programs were 
newly created for the Phase I A effort. 

The upper level GIS LANDSAT analysis was performed in two separate 
locations. The LANDSAT clustering analysis was performed in cooperation with 
the remote sensing group of the Space Sciences Laboratory at Berkeley on an 
image processing system developed around a Data General Nova 840 computer. 

All other upper level GIS processing was performed on EarthSat’s PRIME 450, 
located in the Washington, D.C., office. Lower level GIS processing was 
accomplished with the LANDPAK system on a PRIME 550, located at the premises 
of EarthSat’s Berkeley office. The same system was used to integrate all 


data and to develop and run the estimation subsystem. 

Although it would have been possible to perform all processing in-house, 
to ascertain a state of the art effort the assistance of experienced 
personnel at the Space Sciences Laboratory was procured for the LANDSAT 
classification. 

1.5 Document Organization 

The remainder of this document is divided into two main parts. The 
first. Section 2.1, contains the concepts and theory for the estimation 
subsystem as well as a description of the programs developed for it. Some 
of the programs can be incorporated into a permanent subsystem; most are 
only temporary, written for specific tasks in the Phase I A tests. 

The second part. Section 2.2, contains the report of the Phase IA testing 
effort. The test area is described, and the input and data preparation 
techniques are outlined. The results and the analysis of the results for the 
parameters evaluated are presented. 

2.0 EVALUATION 

The propose. MAIS system is based on several new and innovative concepts. 
Before the event of current computer technology, resource information systems 
mostly consisted of maps,, aerial photos, and data files, with severe physical 
limitations on the amount of numerical data that could be manipulated to pro- 
vide answers to management questions. The overall emphasis was on collecting 
new resource data for specific problems. The resource base itself was used 
as a data base with limited access to resource parameters through sampling 
methods. 


With the continuing development of computer technology, the capability 
to handle large statistical data files increased. Sampling and statistical 
techniques became more complex, and in the last decade the revolution in 
computer graphics has given rise to geographic information systems (GIS) 
which handle maps as well as descriptor data. With these systems it has 
become possible to create an accurate and comprehensive model of a resource 
base with which management problems can be evaluated, and actions can be 
simulated. It may. therefore, seem that the pendulum is swinging the other 
way and that complete enumeration of the resource model is now reasonable 
in many cases. In these cases, sampling methods may have become obsolete. 

In the MAIS design it has been recognized that for large areas under 
diverse ownership, for which many different kinds of questions relating to 
separate disciplines must be answered, the GIS technology as well as 
statistical and sampling methods must play a significant role. For even 
if complete enumerations were possible, they might be more efficiently applied 
in large sample units by means of which estimates of required accuracies 
can be provided over large areas. 

Two basic approaches are possible when combining the sampling concept 
with the use of GIS systems. One might model the entire resource base and 
sample within the GIS system to obtain answers, or one might enter only 
selected portions of the resource base into the GIS system and fully enumerate 
these to obtain specific estimates. A combination of these two methods may 
also be used to take advantage of the best features of both. This is the 
concept favored in the MAIS design. 

Two GIS systems are involved. In the upper level system, the resource 
base is represented in its entirety in the form of a LANDSAT classification 
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image, possibly combined with other auxiliary data of the same resolution- 
This model can be cheaply enumerated in different ways, this being its main 
advantage. The related advantage is the timeliness of the information in 
the LANDSAT images. Disadvantages are poor resolution and limited infor- 
mation with a sometimes unknown meaning. 

To compensate, the lower level GIS stores high resolution detail, but 
only for selected sample areas. This information is also kept up-to-date, 
but it is mostly based on aerial photographs and ground data; hence, the 
update frequency is not as high as for the LANDSAT data. 

2.1 Estimation Subsystem 

Combining GIS technology with sampling and statistical techniques presents 
a unique challenge. In conventional sampling methods, randomness is mostly 
assured by selecting sample units in random locations. The GIS technology 
presumes, however, that the selection units stored in the GIS are permanent; 
additional units may be added, but the power of the system lies in its facility 
to accurately keep track of a given piece of land. The same problem has, of 
course, surfaced in the past in the transition from one-time inventories to 
the CFI approach. To reconcile these opposed concepts, some compromises need 
to be made, and it is worthwhile to consider what the tradeoffs are for the 
available options. 

The first option is to shift the emphasis on the random samples in the 
lower level GIS to the upper level, where complete enumeration can easily be 
made; to tie the upper level to the lower level by means of a model; and to 
rely on the error distribution in this model for appropriate random effects. 
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A disadvantage of this approach is that a priori assumptions on the error 
distribution in the model must hold for it to provide unbiased estimates. 

The traditional approach in sampling methods has been to avoid estimators 
for which such a priori assumptions are required, and techniques have been 
developed to obtain robust, distribution-free estimates. 

The second option is to use these robust estimators and forego complete 
location randomization by using the same sample (with possible added units 
as stored in the GIS subsystem on successive occasions). This approach is 
biased iri the long run, because the population is limited to the areas 
represented in the GIS. This may be somewhat compensated for by the use of 
the upper level data, but mostly these act as an efficiency booster in the 
overall design. 

The tradeoffs will be illustrated in more detail in the following 
section. For the Phase IA test, only the first option has been explored 
in actual test computations. The second needs further theoretical development. 
Hopefully, this can be accomplished during Phase II. 

The remainder of the current section has been divided into three parts. 

The general structure of the estimation subsystem is presented first. This 
is followed by a more detailed discussion of the statistical formulation for 
the MAIS design. The section is concluded with a description of the esti- 
mation subsystem components. 

2.1.1 Structure 

When examining the MAIS design from a sampling point of view for the 
purpose of classifying the overall approach, it is clear that a multistage 
"sampling" design is used, with "multi" equal to three. The stages in a 



multistage design normally refer to the sampling steps (Murthy, 1967)* with 
which the target population is approached. In the multistage concept as 
introduced to forest inventory fay Langley et al (1969), each sampling step is also 
associated with a type of imagery of a certain scale except for the final 
stage, which refers to the ground level. In the MAIS design, this association 
is more general, as any kind of useful information stored in the 6IS systems 
may be used at a given stage consistent with its spatial resolution. Thus, 
for example, at the county level LANDSAT images alone may be used, or they 
may be combined with NCIC data; or, one may prefer to use a general soils map 
instead. 


2. 1.1.1 Data Structure 


Each stage in the multistage view of the MAIS corresponds to a set of 
spatial data with matching attribute data of a given resolution. The 
nature of the data at each of these steps is as shown in column 2 of Table 1. 

TABLE 1 


DATA OVERVIEW 


Stage 

Type of Data 

Spatial Type 

GIS 

1 

(Primary) 

Sample Unit 
PSU 

LANDSAT 

LANDSAT Classification 
NCIC DATA 

Digitized Map Data of 
Comparable Scale 

Cell 

Upper Level 

2 

(Secondary) 
Sample Unit 
SSU 

Aerial Photo Classifications 
Digitized Map Data of 
Comparable Scale 

Polygon 
Line, Point 

Lower Level 

3 

(Ultimate) 

Field Data 

Point 

Lower Level 


Sample Unit 


Throughout the MAIS design concepts document, references are made to the 
upper and lower level GIS systems, connected through the "linear" model”. In 
this document, the linear model is incorporated into a more appropriate esti- 
mation subsystem. More than one linear model may be applied to obtain one set 
of estimates. From this perspective, the term "upper-lower level", as used 
in the design concepts document, may be somewhat misleading; upper level 
applies to the first stage, whereas lower level applies to both second and 
third stages as indicated in column 4 of Table 1. The term "level", as used 
in the design concepts document, is related to storage and resolution require- 
ments rather than sampling steps. 

The basic resource units as stored in the GIS system have both a spatial 
and an attribute component. Each cell has a class, and each polygon line has 
an associate descriptive record. The effect of the estimation subsystem is that 
attribute data are divorced from their spatial equivalents to be summarized into 
estimates which then apply to much larger spatial units. At the first stage, 
cell classes are summarized into class proportions by PSU; at the second stage, 
polygonal areas are converted into class proportions by SSU; and at the third 
stage, fixed plot (USU) data are converted into averages by class. The sum- 
arized data are then related together into county estimates. The components 
to prepare the spatial data at each stage for use in the estimation subsystems 
are described in Section 2.1.3. 

2. 1.1. 2 Processing Structure 

Once the data at each stage have been properly summarized, the final esti- 
mates are derived in two steps. The first step ties the first two stages 
together; the second step combines the output from the first with the third 
stage data to produce the final results. The entire process is schematically 


shown in Figure 1. 


DATA PREPARATION 


Stage 1 

Class Proportions) 

STEP 1 




Area Estimates \ 

STEP 2 

Stage 2 

Class Proportions) 


Final Estimates 

Stage 3 

Class Averages 

) 



Figure 1. Processing Steps 

e 

Step 1 revolves entirely around the estimation of clars or stratum 

areas. Step 2 integrates the area estimates with parameter estimates, by 

class, to obtain parameter estimates for the larger units, such as the 

county. The proposed "sampling" technique for Step 1 is a variation of 

regression sampling. The process for Step 2 can be largely characterized 

by stratified sampling. As in double samplinq for stratification, the 

stratum or class areas are not fixed but are themselves random variables. 

This is one of the unique aspects of the MAIS design. 

The reason for first arriving at a set of area estimates independently 

from the resource parameters estimates is the following: An alternative 

approach could have been taken in which various parameter estimates would 

* 

have been propagated through all stages with independent linear models for 
continuous variables; it would have been difficult, however, to maintain 
known relationships between the parameters in this kind of propagation. 

With the current approach, relationships existing at the ultimate stage 
are not changed, as one only multiplies through by area. 

A summary of the estimation subsystem components is presented in Table 2. 


TABLE 2 


ESTIMATION COMPONENTS 


STAGE 

DATA PREPARATION 

STEP 

ESTIMATION 

1 

LANDSAT Proportion 
Extraction 

1 

Area Estimation 
Component 

2 

Aerial Photo Proportion 
Extraction 

2 

Summary 

Component 

3 

Field Statistics 
Computation 




Each of these components will be discussed in Section 2.1.3. However, 
first it will be necessary to examine the statistical rationale for Steps 1 
and 2. 

2.1.2 Statistical Formulation 

The Step 1 area estimation component is conceptually one of the most 
important components of the MAIS system. In it, a novel approach has been 

taken which, if proven successful, could set a new standard for incorporating 

* 

LANDSAT into multistage designs. 

Traditionally, the aim of most LANDSAT classifications has been to 
produce a class map for which there is an optimal one-to-one correspondence 
between its classes and resource categories defined in some other, more 
direct way. Results are usually expressed in contingency tables which, 
when both marginal classifications are identical, are referred to as confusion 
matrices (Colwell, 1979*, Hildebrandt, 1979); or when this is not so, are 
referred to as co-occurrence matrices (Isaacson, et al , 1979). 


It is interesting to note that in recent years a new kind of emphasis has 
been placed on this type of matrix. It seems to result from the realization 
that a perfect diagonal confusion matrix is not attainable, at least not for 
wildland resource classifications, and that therefore the matrix itself can 
be a tool to interpret the classification interpretation. Consequently, 
confusion matrices have been more carefully constructed using sampling 
techniques (Mayer, 1979; Sader, 1979; Todd, et al, 1980), and different kinds 
of hypothesis tests have been applied (Isaccson, et al, 1979; Sader, 1979; 

Todd, et al, 1980). 

The MAIS approach goes along with this development but adds a new point 
of view which, in effect, relaxes the one-to-one correspondence concept to 
the extent that any unsupervised classification may be of use. 

The device used is a transition "probability matrix" (Telser, 1963), or 
a "projection matrix" (Pielou, 1969), which translates the stage one proportions 
into stage two proportions as follows; 

a - e’P (0) 

where a and e are K and L element class proportion vectors, and P is a L x K 
matrix of "transition probabilities." P can be estimated from the sample 
proportions of a set of matching PSU-SSU's with equal areas. 

This estimation is not without problems, some with published solutions 
only since 1976. The best known example in economics literature is an 
application by Telser (1963), who estimated transition probabilities for 
smokers switching between the brands of Lucky Strike, Chesterfield, and 
Camel from year to year (1923). 
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f. Being able to estimate P and its covariance matrix takes the MAIS design 

a step beyond the usual application of the confusion or co-occurrence matrix. 
Rather than subjecting it to interpretation, it is used to directly translate 
ft proportions from one stage to the other. For this reason P will be referred to 

as a "class transformation matrix* in the remainder of this document. In the 
following section we will assume that P and the estimators for P and 
its covariance matrix have been computed. How this is accomplished is the 
subject of Section 2. 1.3. 1.1, 


% 


2. 1.2,1 Step 1 Estimators and Variances 

The formulation given in the following is all based on the assumption 
that sampling is with replacement. 

Suppose that the PSU-SSU combinations have been located randomly and 
that there are n such combinations taken from an infinite population situated 
in the county. It is instructive to first consider the sampling properties 
of the SSLf proportions as an independent set. The SSU can be considered a 
cluster of fixed size, and hence Cochran's theory for estimation of 
proportions in cluster sampling can be applied (Cochran, 1963). 

If a., is the proportion of class i in SSU j, then the estimator for 
i J 

the population proportion is as follows: 


- 1 Ea 
i n j=1 u 


(1) 


An unbiased sample estimate of the variance is: 



1 

n(n-l) 


j=l 1 
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( 2 ) 


Q 


Since there are K classes in the SSU, and the class proportions are not 
independent, the covariance of a- , a^ must also be considered: 

c3v <W = ' V< a kj ‘ S k> < 3 > 

Let the vector of the above K class a.. estimates of the county be denoted 
by a, then using (2) and (3), the estimated covariance matrix of a, ^a can 
be computed. 

Likewise, from the PSU's, the vector of L primary class proportions e 
can be computed, as well as its estimated covariance matrix 

The following estimators for the secondary county proportions are now 
considered (estimators B and E in the following represent the two options 
alluded to in Section 2.1). 

A. Avtyage SSU Class Proportions 

The estimator is: 


The estimated covariance is defined as described above: 

= (5) 

If the PSU-SSU’s are randomly located, the estimator is unbiased. The 


primary stage is not considered in this estimator. 


* 


B. Regression Prediction 
The estimator is: 


a pr = e'P (6) 


The estimated covariance is: 

$S pr = E^PE' (7) 


A 

Here e is the vector of L county proportions (enumeration). P is the 
estimated class transformation matrix, is the estimated covariance matrix 
of P. 


P can be rearranged as a vector with M = L x K elements, and hence P 
is an M x M matrix. To be compatible with this matrix, e must also be re- 
arranged into a diagonally structured matrix E, as follows: 


E : 


V~? ?sr 


;i o ......2 : 

K x M 


( 8 ) 


The estimator is unbiased if P is an unbiased estimator of P. This is 
only the case if the traditional regression assumption holds; namely, the 
errors are uncorrelated and have constant variance over the range of the 
relationship. 
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The evaluation of a is the subject of the Phase IA tests. Its advan- 

pp 

tage for use in MAIS is that the random distribution of the sample is not 
that important as long as a full range of conditions is present. The random 
effects which occur around the linear relationship are what count. The dis- 
advantage is that the assumption on the error distribution in the model must 
hold. 

It is likely that the increase in efficiency (lower variance) with this 

estimator is in the same order of magnitude as for the traditional regression 

2 

sampling estimator, namely p x 1 00% where p is the correlation coefficient 
(Zarcovic, 1964). 

C. Regression Sampling 
The estimator is: 



a + (e - e)P 


(9) 


The estimated covariance is (defined below). It is conjectured that 

the elements c^j of the estimated covariance matrix can be computed as follows: 






x 


( 




(10) 


where jte^ -e)P| ^ denotes the i^ column of the row vector resulting from the 
multiplication in the brackets and v is the degrees of freedom used in the 
regression error calculation. 


I 


t 


? 


V 


Sr 


4 


% 


t 




A different approach to the estimated covariance is to note that (9) 

contains two vectors and one matrix of random variables. The partial 

derivatives of a with respect to these variables can be obtained, and the 
rs 

’'delta method" of variance propagation can be applied, given that the 
combined covariance matrix is available. Along the diagonal this matrix 
is composed of the submat rices ^e, ]0P. As generally the covariance 

of two random variables tends to zero when the sample size increases, the 
off diagonal submatrices could possibly be neglected for a relatively large 
sample. 

The estimator a^ s is the multivariate equivalent of the traditional 
regression sampling estimator. All remarks about its properties are based 
on the description of this estimator as presented by Cochran (1963) and 


Murthy (1967). 

The estimator is unbiased if P is a preassigned matrix. However, if 
P is estimated from the sample, the estimator is biased; but the bias is 
small and becomes smaller with increasing sample size. 

The advantage of a over a is that it does not depend on assumptions 

I ^ pr 

about error distributions in the regression model. The disadvantage is 

i 

that the sample must be randomly selected. A complication of this 
estimator when dealing with proportions is discussed in Section 2. 1.3.1. 

The estimator was not tested in the Phase IA tests. It may be an 
attractive alternative to the a pr estimator because of its insensitivity to 
the error distribution in the regression model. It is therefore hoped that 
the opportunity will be present in Phase II to further develop the 
covariance aspect and to test and compare this estimator with the alterna- 
tives. 


o 
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2. 1.2. 2 Step 2 Estimators and Variances 

The Step 2 process integrates the stratum or secondary class area 
estimates with observations made in the field. Two types of observations 
can be made: discrete and continuous. From the USU point of view, a 
discrete variable is a class designation; from the class point of view, 
a USU is either in or out, and hence it can be considered a binary 
variable. 

First, discrete variables will be considered. 


2. 1.2. 2.1 Discrete Variables 

One is confronted with the same problem present in Step 1, namely how 

to translate from one set of class designations to another. Again, this 

problem can be solved with a linear class transformation (Section 2.1.2), 

/v 

but Pg anc * ^ must °ki' a ^ ne d differently. 

A 

Pg can be obtained by constructing a table with the secondary class 
definitions in the left margin and the ultimate definitions at the top. 

Each element in the table is the proportion of ground plots in the ultimate 
class j identified as secondary class i, or p.^ = n ij ./n r (11) 

If there are K classes at the second stage and I classes at the ulti- 
mate stage, then P is a K x I matrix; and its corresponding covariance 
matrix is of dimension (K x I) x (K x I). An estimate of this matrix can 
be obtained from the ground sample by computing the diagonal elements as 



n i.i (n i - "i.P 


(12) 
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using the variance formula for binomial distribution, and the off diagonal 
elements as 


off 

c 


. i n i,j n ke( n i n k - n ij n kE> 

TK-Tnp- W 


(13) 


were obtained from the primary and 

secondary class proportions. The ground sample provides P^ ar| d ^P r An 
estimate for the proportions of the ultimate (ground) classes is now 
obtained as follows i 


In the first step, P and 


P 


g = 



(14) 


Where g is a vector of ultimate class percentage estimates: a can be 
obtained either as a^ s or a . In the latter case (14) can be rewritten as 

g-e'PjPg (15) 


The covariance matrix of g again can be estimated with the delta 
method. Assuming that a and are independent, one can derive: 

= A P 2 V + P^a P 2 06) 

Where A is a diagonally banded version of a. A subject of further 
investigation must be whether the independent assumption holds. It is 

A 

also possible that the estimated covariance matrix of P ^ can be obtained 
in a better way. 


2. 1,2. 2. 2 Continuous Variables 


For continuous variables, two important types may be distinguished: 
by class (for the county) and by county. In the by class category, we may 
obtain estimates by primary, secondary or ultimate class definitions. A 
further distinction can be made as to whether the estimate is a "per acre" 
value or a total value. Figure 3 presents a hierarchical organization of 
these types of estimates. 

In the preceding section we have used e, a and g to denote primary, 
secondary and ultimate stage proportions and proportion vectors. In the 
following we will denote per acre estimates in each of these categories by 
e, a, g, and total estimates by e", a, g“. Not all of these kinds of 
estimates are useful. The most important ones are discussed in the 
following. 


rBy County 


"Per Acre 
Jotal 


ESTIMATE- 


Primary 


L_By Class 

(within county) 


Secondary 


Ultimate 


Per Acre 
.Total 
'Per Acre 
Total 
'Per Acre 
Jotal 


Figure 3. Types of Estimates 


A. Ultimate Class, Per Acre 


The estimate is simply the mean of the desired variable as observed 


r 


? 




f 


on the ground* computed over all plots in the ultimate class designation. 

Let this mean be denoted by y. (class i). And let s- be its estimated 

i v- 


variance. Then, 


g. - y. and s 2 | = s 2 - (17) 

B. Ultimate Class, Total 

The area for Class i is estimated as (Section 2.1.2.2.1), and 
hence the estimate for the total is: 


A 



A _ » 

- g. y. c 


(18) 


Where C is the county area. An estimate for its variance is: 


s 2 i = C 2 (y? s 2 * + g 2 s 2 _ ) 
i a. y i y. 


9 ,- 


(19) 


A, 

where s * is the appropriate diagonal element of Sg (Section 2. 1.2. 2.1). 
9 i 


C. Secondary Class, Per Acre 

The estimator is similar to the one described under A., but the 
ground plots are sorted according to secondary class designations, and the 
means are computed for these classes. 


I 


D. Secondary Class. Total 
Similar to B.: replace g with a. 
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E. Primary Class , Per Acre 

A different approach has to be used for this kind of estimate. The 
ground plots cannot be directly related to the cells at the primary stage 
because of misregistration of primary maps and images with respect to the 
ground coordinate system. 

Instead, the relation e = a'P can be used in reverse, namely: 

e' = P _1 a (20) 

* 

where P - ^ is the generalized inverse of P. Using this relation on the 
vector of secondary class per acre estimates (this vector can be considered 
a constant times a set of proportions, with the constant automatically 
carried through the matrix multiplication), one obtains a vector of per 
acre estimate as follows: 


e 1 = P" ] a (21) 

F. Primary Class, Total 

The same transformation method (24) can also be applied to the vector 
of total estimates by class. 

The application for the primary class estimates is that they be used 
to construct a precise legend for a primary stage classification map. This 
may be a very important use for this type of estimate. 


G. Entire County, Per Acre 

County per acre and total estimates can be based on any of the three 


stratifications: primary, secondary or ultimate. 

A 

The per acre estimate of the ultimate classification by class was gt. 

The per acre estimate for the entire county is simply a weighted average over 
all classes of these estimates: 


g 


9 ’- 


( 22 ) 


or by (17): 


I 


= . y . 

i=n 1 


(23) 


where I is the number of ultimate classes, and the yVs are the average 
parameter estimates for the ultimate classification. 

The variance can be estimated as: 



(24) 


where 


£v 1 


s a diagonal matrix with s - 

y 


on the diagonal and zero's elsewhere. 


H. Entire County, Total 

The estimator of the previous paragraph is simply multiplied by C, 

2 

the total county area. Its estimated variance must be multiplied by C . 


2.1.3 Components 

Proposed components for the estimation subsystem are shown in Table 2. 
In the following subsections, each of these components will be discussed. 
The area estimation component ("linear model" of the design concepts 



document) will be treated somewhat in depth* since much of the Phase IA 
effort was spent on developing a prototype. The function of the other 
proposed components was "simulated" by calculating needed intermediate 
results with "throwaway" programs. Hence, each of these components will 
be discussed only briefly, mostly with respect to its role in a future 
MAIS. 


2. 1.3.1 Area Estimation Component 

Three approaches are possible to obtain an estimate of the co-occurrence 
or class transformation matrix. First, one may spatially overlay the two 
classification maps and compute the area of each of the combined categories. 
The proportions of these area in terms of total area for the classes of one 
of the classifications are the elements of the transformation matrix „ 

If a complete overlay is not possible, one can resort to obtaining a 
sample of points and inspect each point for classification. This is the 
method taken in Step 2 for deriving ultimate class estimates. It is also 
commonly described in the literature (Isaacson, et al, 1979; Sader, 1979; 

Todd, 1980). 

The third approach is to use a regression model to estimate the trans- 
formation matrix. This is basically possible if there is a sufficient number 
of sample units with proportion vectors a and e. However, working with pro- 
portions presents some basic difficulties, the most notable one resulting 
from the requirement that the predicted proportions must add to one. Another 
difficulty is that a large number of coefficients must be estimated. 

The complete overlay of two complete classifications, if not prohibitive, 
would be extremely time-consuming and costly, especially in the polygonal 


mode. An overlay using a cell approach would be more reasonable, but then 
a polygon to cell conversion would be necessary. The sample approach also 
requires an overlay if automated and, if used with individual pixels, must 
suffer from registration problems. 

The regression method seems to offer several advantages: the entire 
data set is used; a linear model is developed to which a large body of 
statistical "know-how" applies; and the developed matrix can be used 
directly to translate from one classification to another. For these reasons 
the regression approach was selected. How the associated problems were solved 
in the Phase IA development is the subject of the following section. 


2. 1.3. 1.1 Theory 
A. Conditions 

The linear model a = e'P can be set down in terms of its individual 
elements as: 


e l e 2 


1 x L 


p n p 12 ...... . p 1K 

D 21 P 22 p 2K 

a 

# 9 

9 

© 

* * 

P L1 P L2 * P LK 


L x K 




K x 1 


( 25 ) 


Since e is a vector of proportions, the following conditions hold: 


i 




f 


L 

Le, =0 0 <e, < 1 i = 1, L (26) 

i =1 1 


When the model is used in the prediction mode, the following 
conditions must hold for the output vector: 

K 

La, = 1 0<a <1 1 =1, K (27) 

i=l 1 1 


The problem to be solved is how to constrain the elements of P such that 
(27) will hold true, not only for members of the sample but for any 
prediction. 

First it can be seen that the inequality in (27) can be reduced to: 

a.. > 0 for i = 1 K. 

Now each element of a is computed as; 


* * ... n J 


■ P-. 

1=1 i ij 


(28) 


The following condition must therefore be satisfied: 

L 

p.. > o 

1=1 i lj 


(29) 


But, e^ 0 for i = 1 , L. 

It is, therefore, necessary and sufficient that P..£0 for all i and 
j, for (29) to hold. The first constraint on the elements of P is thus a 
nonnegativity constraint. 
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The second requirement: ]Da , = 1 can be translated into a constraint 
on the elements of P, as follows: 

K K ^ L X l K 

■ jst lit e i p ij = i=1 j=1 e i p ij = i^1 e i Fi p u (30) 


Because = U a necessary and sufficient condition for the a.'s to add 

1-1 K 1 

to 1 is that = 1* 

i=l 

Summarizing, for the estimator a (Section 2. 1.2.1), the following 

r * 

constraints must be enforced in the regression: 


£p = 1 and p. . > 0 

■J-] TJ 1J 


(31) 


The same conditions are also reported by Judge and Takayama (1966) in 
this discussion of the cigarette example. Goodman (1953) demonstrated that 
the condition ]Cp.jj = 1 is automatically satisfied in ordinary least 
squares. 

Without special precautions, negative estimates of p.. will occur 

* J 

however. 

A different estimator may need different constraints. To examine the 
requirements for a (Section 2. 1.2.1) one may first observe that the 

I w 

following condition must hold: 



Dp. . >i 


(32) 


and 
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where e. and a. represent the primary and secondary sample average proportions 

* J 

and e.j is the primary class percentage for the entire county. (The bars over 
the letters here mean sample averages; they do not indicate per acre estimates 
as elsewhere in this document.) 

Now it can be seen that the earlier derived constraints do not guarantee 
(31) because e.. - eL may be negative. Therefore, a general solution for a rs 
is not available, but for a specific set of e.'s (county proportions of 
primary classes), (32) is a linear condition in the p. .'s which can be 

• J 

enforced with the regression approach described below. Some algebraic 
manipulation shows that (33) holds when (31) is enforced. Thus, a^ s presents 
an additional complication when estimating proportions with solutions that 
are less general than those obtained for a . 

B. Regression Model 

Suppose that n PSU-SSU pairs are available from which the P matrix can 
be estimated. The following equation system can then be set up; 


> 


* 


f 


l 


■V 
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Cen I][e 21 I] 

C e i2 ^3[®22 ^ 


NxK 


^lN ! ^ e 2N I] 


C e H I] 

t e L2 


t e LN ^ 


LxK 
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P 11 

p 12 




LK 
(LxK)xl 


a ll 

a 21 


• 

a Kl 

a 12 

a 22 

• 

m 

* 

a K2 




(34) 


a lN 

a 2N 


L a ECNj 

(NxK)xl 


where, for example, [e^I] has the following structure: 



0 . 


e ll 



0 



KxK 


( 35 ) 
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The P matrix has been rearranged as a vector, and the secondary 
proportions have been joined into a single column. The vector £ is made 
up of random disturbances. 

The system (34} can be expressed in the usual regression notation 
as: 


X/3= Y + £ 


(36) 


As was shown, this system must be subject to a set of a priori 


constraints to yield admissable predictions. 

Noting that an inequality constraint can be written as two inequality 
constraints, for instance a = c: a > c and -a - **c, the condition (31) 
can be expressed as the following set of inequalities: 
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((LxK) + 2L) x (LxK) ' ((LxK)+2L) xl 


Or in matrix form: 


A/3 > c (38) 

If b is the ICLS estimate for /3, then with the least squares criterion 
the following problem must be solved: minimize 

Z = l/2(y-Xb)‘{y-Xb) (39) 


subject to 


Ab > c 


(40) 


or 


Ab - v - c 


(41) 


where v is a slack vector. 

Several approaches to solving the general inequality constraint least 
squares problem (ICLS) can be found: 

(1) Mixed estimation (Zelner, 1971; Theil, 1963) 

(2) Quadratic programming (Lemke, 1962; Liew, 1976; Dantzig, 1967) 

(3) Branch and bound Methods (Gentle, et al , 1980) 

(4) Statistical testing of negative proportions (O' Reagan, 1980) 

(5) The condition equation approach (van Roessel, 1974) 
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The quadratic programming approach was selected for use in the MAIS. 

It provides a correct solution under all circumstances, and since 1976, an 
approach to the calculation of the covariance matrix has been available 
(Liew, 1976), 

C. Inequality Constrained Least Squares 

In this subsection, the ICLS approach will be outlined briefly with 
specific emphasis on the derivation of the covariance matrix as proposed by 
Liew (1977). 

The Kuhn Tucker conditions specify that at the optimum point the following 
conditions hold: 


(X l X)b - Xy - A 1 X = 0 

(42) 

X 1 (Ab - c) = 0 

(43) 

X>0 

(44) 


where \ is a vector of Lagrangian multipliers. If X is known, b can be 
computed as: 

b = (X'X)" 1 (Xy - A 1 *.) (45) 

substituting into (41), one obtains 
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\ 


t 


A(X 1 X) -1 Xy - c + AU'Xj'Vx = v (45) 


t 


or defining 


q = A(X J X)“ 1 Xy - c = AjS - c 


(47) 


f 


where 0 is the OLS estimate and: 


U - A(X‘ X) -1 A* 


(48) 


one obtains the so-called fundamental problem, namely: find v and X, such 
that: 


v = WX + q , v' X = 0 , X>0 (49) 

Two algorithms for solving this problem are available, one by Dantzig 
and Cottle (1967) and the other by Lemke (1962). The Lemke algorithm was 
programmed for the Phase IA test. The equation v = WX+qcan be written 
in matrix form as: 


[I -W] 
m x 2m 



LM 

2m x 1 


(50) 


* 
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If there are m constraints, then v and X are m x 1 vectors, and hence 
[v is 2m x 1. But since the inner product of v and X is zero, the 
near zero elements can be eliminated from v and X. and (50) can be 
reduced to: 


I 


V 


8 


8 


[I 


1 




= q 
mxl 


Now one can "solve" for [v° ??]' as follows: 


Fv° 

1 



k 


M 2- 


or, in particular, by (47) 

X°= M 2 (Aj 8 - c) 

The following breakdown can also be made: 

A l X - (jq,3J) 

Substituting (53} and (54) into (45), one obtains: 

b = 8 + (X'X)" 1 A2M 2 (AB-c) 



or 


(51) 


(52) 


(53) 


(54) 


(55) 


o 
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Designating the multiplier of e as M, the covariance matrix of b is: 


2lb = MV(£)M' = o 2 M(X * X)” 1 M* (56) 

* - * 

and its estimator is as follows: 

= s 2 M(X'X)“ 1 M . (57) 

nK 

where s 2 = £e 2 /v and e = a - a . 

l ill 

Normally v is computed as n - k, where n is the number of observations 
and k is the number of estimated parameters. In the proportion estimation 
case, v is computed as follows: 

v = n(K - 1) - K(L - 1) (58) 

since each vector of proportions has a Implied mean, a represents a 

. * 

secondary class sample proportion prediction, and a is its observed value. 

Xj 

A coefficient of multiple determination can be computed as: 



where 1/K is the average of all secondary class proportions in the sample. 


» 


f 


f 


Similarly, an F statistic for the significance of the linear relation can 
be computed as: 


nK 

F = 1 ~ 1/K)2 

s 2 K(L-l) 


(60) 


2. 1.3. 1.2 Programs 

The theory of the previous section was incorporated in the program 
ESTPRP written in Fortran IV. This program currently runs on the PRIME 550 
at the contractor's Berkeley facility. As was mentioned in Section 2. 1.1. 2, 
one of the problems associated with prop -^i on estimation is the large 
number of coefficients to be estimated. For instance, if L = 10 and 
K - 14, 140 coefficients must be determined, and the associated covariance 
matrix is of size 140 x 140. Several other matrices of this dimension are 
also required in the course of computation. The memory requirements of 
the current program are approximately 800,000 bytes, allowing for the 
estimation of a maximum of 300 coefficients. This presents no problem on 
the PRIME 550, because it has a virtual memory; however, should it be 
desired to run a program on a machine with a more modest, fixed-size 
memory, an additional programming effort will be necessary. 

The ESTPRP program is based on a general ICLS solution, as implemented 
in subroutine ICLS. This was done because the exact form of the constraint 
matrix was not known at the beginning of the project. Therefore, considerable 
space savings can be achieved by implementing a more specific proportion 
estimation solution. 

Already, a considerable reduction of needed memory was obtained by 


o 
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an algorithmic computation of the initial X'X matrix, rather than using 
straightforward matrix multiplication. The X’X matrix is a sparse matrix, 
and very likely, a special inversion algorithm designed for this type of 
matrix, can perform in a fraction of the time currently used by the 
generalized inverse routine RUST. The routine ICLS calls the routine 
LEMKE, which contains the Lemke algorithm. 

At the beginning of the project it was not known whether the row sum 
constraint of the P matrix would hold implicitly, or whether the constraint 
was to be expressed in the A matrix. The latter course was chosen, and 
rightly so, as it appeared that these constraints were active in most 
cases tested. 

Because a computer is not a perfect mathematical machine, the entire 
ICLS approach was implemented with appropriate tolerances. For example, the 
row sum constraint is enforced as follows: 

K 

1 - f iSp sni 

3=1 1J 

with £= 0.0001. The error in the sum of predicted proportions was 
generally less than 0.0005. 

The PRIME 550 is a 32-bit machine; all computations are therefore 
performed in double precision. 

The program ESTPRP writes P-j, and ^P^ to disk for further processing 
by program ESTCNT. ESTCNT takes the county proportions (see Section 2. 2. 2. 2. 6) 
and the output matrices of ESTPRP and computes secondary class proportions 
for the county according to equations (6) and (7), Section 2. 1.2.1. It, 


in turn, writes Sp r and to the disk for further processing. 

2. 1.3. 1.3 Program Testing 

Although the "linear model" concept of the Step 1 area computations is 
simple and elegant, its details as implemented through the ICLS approach 
are quite complex. The greatest precautions were taken, therefore, to 
assure that the programming was correct and that the program will 
continue to function properly. This was accomplished by (a) computing 
known test cases, and (b) incorporating test computations into the 
program. 

Lemke (1963) provides a numerical example, beginning with the matrix 
W. It was used to test the LEMKE subroutine. Liew (1976) gives an example 
of a covariance computation for an economics case. An attempt was made 
to reconstruct this test computation, but it failed as the reference 
material in the paper was inadequate. An example was therefore requested 
directly from the author; a data set was received together with a test set 
made with the author's package at the University of Oklahoma (Liew and 
Shim, 1978). The covariance matrix of this set was duplicated on the 
PRIME 550. The differences between the computed elements of the covariance 
matrices was generally less than 0.0003 for elements with a magnitude of 
0 . 02 . 

The test computations built into the program check various relations 
that should exist in the course of the computation, such as (49), (52) and 
(54), of Section 2. 1.3. 1.1. In addition, matrix inverse computations are 
checked by multiplying the result by the original matrix, and matrices to 


be inverted are inspected for rank deficiency. During development of the 
program, test computations provided insight into the tolerances to be 
used; in future use, they will quickly abort the program in the 
eventuality of storage problems or other environmental defects. 

2. 1.3. 2 Field Statistics Component 

This component, as currently implemented in programs CSTAT and DSTAT, 
computer the sample averages, by class, and the estimated standard deviations 
for the parameters of interest (CAI) . 'jee Section 2. 1.2. 2. 2, subparagraph A.) 

r 

It also computes anc * ^ ee Section 2. 1.2. 2.1). 

The procedure currently in use is as follows: each field plot is inter- 

preted as to its secondary (aerial photo) attributes. A file is then con- 
structed which assigns two sets of attributes to each plot. The first set is 
the PI code; the second set is derived from the basic plot and field observa- 
tions and consists of the plot estimates of the parameters of interest. 
Currently this second set contains: CAI, Ground Land Use Code and Forest 
Type Code; the latter as defined by McClure, Cost and Knight (1979). 

Plots can then be sorted on various combinations of PI attributes in 
the identical manner in which the SSU resource units are sorted for a desired 
stratification. (See Section 2. 1.3. 5) For continuous variables, averages 
and standard deviations are computed by these classes (Program CSTAT) 

For discrete data, a P matrix and a covariance matrix are computed (Program 
DSTAT). 

If, for security reasons, plot data are not to be stored in the lower 
level GIS, the fieM statistics component can be the primary receptacle for 



all field data and can be the main instrument for all field data-related 
processing. Thus, all field data manipulations currently performed by 
the Plot Summary Program at the Southeastern Experiment Station can be 
a part of the field statistics component. 

2. 1.3. 3 Summary Component 

The function of this component simulated in the Phase IA test are 
currently performed by program ESTSMR. The possible functions are the 
computations of the estimators and variances described in Section 2.1 .2.2. 
Program ESTSMR takes the outputs of programs ESTCNT, CSTAT and DSTAT, and 
computes the final estimates and their estimated variances. 

2. 1.3. 4 Primary Proportion Extraction 

One alternative is to consider the proportion extraction tasks as a 
part of GIS-related processing. However, it is proposed tha.; these tasks 
are to be a part of the estimation subsystem. A compelling reason for 
this recommendation is that the calculated proportions are for classifications 
directly related to the types of estimates to be made, to be specified at 
request time. These classifications can then be defined as the estimation 
subsystem is invoked, and the proportions can be generated accordingly. 

The manner in which primary (LANDSAT) PSU proportions were extracted for 
the current test is described in Section 2. 2. 2. 2. 4. A program EPREP was 
written to further combine classes generated by the program COUNT described 
in this section. 


2. 1.3. 5 Secondary Proportion Extraction 

The LANDPAK report generator is a versatile program capable of sorting 
and summarizing resource unit data in a variety of ways. For the purpose 
of the test, an output option was built into this generator which allows 
one to summarize data onto a file for further processing. Using this 

option, a file is generated which for each RU contains the RU number, the 

LANDPAK-computed area, the SSU number and the PI code for the RU. A 

program JPREP was then created that sorts the RU's in this file according 

to a given classification, computes the proportions of tfv total area of 
each class of the SSU, and outputs a list of these proportions by SSU to 
the disk for further processing. Program JPREP must be modified for each 
additional classification, and as such is actually part of the file. A 
corresponding modification must be made in CSTAT. A user friendly way of 
specifying a desired classification must be part of a permanent MAIS. 


2.2 Prototype Test 


The Phase IA test is described in this section. The report of this effort 
is divided into two main parts: input and data preparation of the upper and 

lower level GIS, and analysis of the Step 1 and Step 2 computations of the 
prototype estimation subsystem. Once again, it should be emphasized that the 
current version of this subsystem, with the exception of the program ESTPRP, 
consists mainly of "throwaway" programs specially created for the Phase IA test. 

The description of the testing effort is preceded by a short description 
of the test area: Kershaw County in South Carolina. 

2.2.1 Test Area Description 

Kershaw County, South Carolina, was selected as the site for the Phase IA 
pilot test. Situated in the north-central region of the state, it offers a 
variety of forest types and land use regimes, making it suitable for evalu- 
ating new, multi resource inventory methods. 

Kershaw County ha:- a total land area of 499,840 acres, of which approxi- 
mately 395,000 are forested (G, C. Craver, 1978). It contains three physio- 
graphic regions: the Southern Coastal Plain, Carolina Sand Hills and the 

Southern Piedmont. 

The predominant forest types are loblol ly-shortleaf pine, oak-gum-cypress, 
oak-hickory, and oak-pine. The distribution of these types is controlled mainly 
by soil type and moisture. Broadly, hardwoods occur in the drains with increasing 
occurrence of pine on drier ground. The influence of man on the distribution of 
forest species is significant. 

Reforestation of abandoned cotton fields throughout the county has been 
encouraged, beginning in the 1950's. Species planted consist chiefly of lob- 


lolly and short! eaf pines. Much of the land on upland sites is cropland and 
pasture. 

The range in elevation is from 100 feet to about 500 feet. The climate 
is mild, due to the influence of the Gulf of Mexico and the Atlantic Ocean. 
Winters are humid and mild, with average January temperatures of 45°F, although 
occasional periods of frost and freezing temperatures occur. Summers are warm 
and humid, with average July temperatures of G0°F. Mean annual total rainfall 
is approximately 50 inches. 

2.2.2 Input and Data Preparation 

2.2.2. 1 Lower Level GIS 

2. 2. 2. 1.1 Sample Selection 

The sampling configuration decided upon includes the 210 Forest Service 
field plots. These are termed Ultimate Sample Units (USU's). In addition, a 
random sample of aerial photo sample units were selected. These are termed 
Secondary Sample Units (SSU's). The SSU's are square, measuring one mile on 
an edge on the ground. 

The SSU's were selected by first selecting (randomly, without replacement) 
a subset of the 210 ground plots. All of the ground plots classified by the 
Forest Service as other than forest or cropland were deliberately included in 
the sample. This was an attempt to improve the distribution of sample plots 
among the classes, as the^e classes were inadequately represented in the ground 


Around each USU selected {we chose a total of 60), an SSU was randomly 
located so that the USU was contained within the SSU boundary and oriented 
orthogonal to the directions of the compass. 

2.2.2. 1.2 Photo Interpretation 

The aerial photographs provided are panchromatic black and white, taken 
in April, 1975. The 1:20,000 scale prints were enlarged from 1:40,000 nine- 
inch negatives. The image quality ranges from fair to poor, with graininess 
and low contrast being the main deficiencies. Incomplete stereo coverage for 
more than half of the SSU's was also a problem. High quality color IR optical 
bar photography, flown in 1979, was used to supplement photo coverage of these 
problem areas. 

A list of land use classes was devised based on the requirements described 
in the RFP. Image quality in conjunction with the interpreters' ability to 
discriminate between these classes was the basis for formulating the set of 
aerial photo classes used. {See Appendix B.) 

Training was accomplished by first selecting training sites, which covered 
the range of conditions identifiable on the photos. These sites were then 
interpreted using a 1-3 power mirror stereoscope. 

A field trip was made tr the test area in July, 1980. EarthSat personnel 
spent 6 days in the area. The training sites were checked for correct inter- 
pretation, and new training sites were established. The training sites were 
described and documented using stereo pairs of color photos taken from ground 
stations. Some training sites were documented by low-altitude, oblique aerial 
photos taken during a reconnaissance and photo flight chartered by EarthSat. 
These documents, keyed to the aerial photos, were the main reference and train- 


ing aid used in the photo interpretation. 

In addition to the interpretation of the SSU*s, the remaining USU's were 
classified as to photo class. Thus, information was compiled for 60 SSU’s and 
210 USU’s. An interpreted photo for SSU No. 20 is illustrated in Figure 2. 

2. 2. 2. 1.3 Data Entry 

The lower level GIS consists chiefly of the LANDPAK system. The first 
step in entering data from source maps is to establish geometric control for 
that map. This consists of establishing enough points on the source map, of 
known ground coordinates, to ensure a good transformation from digitizer table 
coordinates to ground coordinates. This is between 4 and 8 points, usually 4 
if the source map is a controlled map and 6 to 8 if the source map is a delin- 
eated photo or other uncontrolled map. 

Control points were derived from /^-minute U.S.G.S. topographic maps if 
available; otherwise, 15-minute maps were used. An error of 15 meters RMS was 
the tolerance allowed for any transformation. 

During the control procedure, the ground locations of the SSU centers were 
established. These centers were used to generate control units, the control 
layer for LANDPAK data. At this point, digitizing and entry into the LANDPAK 
system of the SSU type-maps could proceed. These type-maps, when entered, con- 
stitute the covertype layer. Similarly, elevation contour lines were digitized 
and entered to form the topography layer. 

The data items entered for the covertype layer consist of elements of the 
PI codes assigned to each delineation or resource unit (RU). Thus for cover- 
type, 5 data items were entered for each RU, including values denoting unknown 
or irrelevant status. Stored data items for the topography layer consisted of 
contour line elevations. Figures 3 and 4 illustrate covertype and topography 
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layer images, respectively, which were retrieved from the database and plotted 
for SSU No. 20. 

2. 2. 2. 1.4 Data Compilation for Soil Loss Estimates 

A model was selected to demonstrate an application of these methods to soil 
erosion potential estimation. The model, the Universal Soil Loss Equation 
{USLE) is expressed as: 


A = RKLSCP 


where: 

A is the average annual soil loss in tons/acre/year from the site 
R is the value of the rainfall erosivity index for the site 
K is the value of the soil erodibility index for the site 
L is the length of slope factor 
S is the steepness of slope factor 
C is the vegetation cover influence factor 
P is the erosion control practice factor. 

{For more information on applying the USLE, refer to USDA, SCS Technical Notes, 
January 1 , 1973. ) 

Average values for R, K and L for each of two major regions in Kershaw 
County were taken from a previous study (Dissmeyer and Stump, 1978). Each SSU 
fell into one of the regions for purposes of assigning these factors. Efforts 
to include soil-type maps in the database were abandoned after delays in 
receiving requested data. This would have facilitated site-specific determin- 
ations for the K factor. Attempts to approximate K values by SSU were also 
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abandoned for lack of data. 

Pertinent information existed in the LANDPAK database in the form of the 
covertype and topography layers. This provided a means of assigning C and S 
values. Slope-class maps were generated using the topography layer and LANDPAK 
subsystems. The following class limits, in per cent slope, were devised: 

1 . 0-2 

2 . 2 - 4 

3. 4 - 8 

4. 8-16 

5. 16 - 30 

6. 30+ 

The slope-class maps were inserted into the database and constitute the slope- 
class layer. A slope-class map retrieved from the database and plotted can be 
seen in Figure 5. Values for S, for assignment to each slope-class (Table 1, 
Technical Notes, 1978) were selected. 

Representative values for C were assigned based on each RU's covertype 
PI code. These values were approximated from published suggested values (Tech- 
nical Notes, 1978). We feel it should be noted here that the PI classes were 
not configured for determining these values. We chose a C value which was our 
best approximation for an average value for that PI class. Similarly, we were 
not in a position to determine P values. Since P applies chiefly to agriculture 
and is really an adjustment to account for efforts to minimize erosion, we feel 

its omission is relatively unimportant but should ultimately be included to 
account for certain forest practices, 

2. 2. 2. 1.5 Derivation of Soil Loss Sample Set 

This section describes the processing of the soil loss data using LANDPAK 
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and special programs up through the interface with the estimation subsystem. 

The slope-class and covertype layer images were retrieved from the data- 
base, and an overlay operation was performed to intersect these two map layers. 
The resulting map, a new layer referred to as the selection map, consists of 
new RU's of homogeneous slope-class and PI class. Figure 6 is a plot of this 
selection map for SSU No. 20. A LANDPAK report was generated for this selection 
map and stored in an auxiliary file. This report contains information on each 
RU as to land area, PI code and slope-class. 

This report was input into EROS, a modified version of computer program 
JPREP2. EROS assigns the appropriate values to factors of the USLE. EROS 
solves the USLE and assigns the RU to a soil-loss class. The soil-loss class 
limits were assigned (in tons/ acre/year) as follows: 

1 . 0-2 

2 . 2 - 4 

3. 4 - 8 

4. 8-16 

5. If - 32 

6. 32 - 64 

7. 64- v 

The program accumulates areas by each of these classes. The proportion of 
the total area contained in each of these classes is then computed. This table 
of class proportions is the output from EROS, which is stored for use by the 
estimation subsystem. Essentially, this process is carried out by SSU. The 
whole process was carried out for only 12 SSU's due to constraints of time and 


resources. 



Figure 6 














I 2. 2. 2. 2 Upper Level GIS 

This section describes the procedures used for the input and manipulation 
of LANDSAT data in the upper level GIS. The upper level functions used in this 
? project included LANDSAT preprocessing, unsupervised classification, image 

registration, statistical tabulation and map generation. With the exception of 
unsupervised classification and digitization of the county boundary, all pro- 
% cessimi was performed at EarthSat's Washington, D.C., office on a PRIME 450 

minicomputer. Classification was performed at the Remote Sensing Laboratory 
at the University of California at Berkeley. The digitizing of Kershaw County 
«' boundary was done by EarthSat, Berkeley, personnel using LANDPAK. 

2. 2.2. 2.1 LANDSAT Preprocessing 

The Kershaw County study area is contained within a single LANDSAT frame 
of path 17 or 18, row 36. Two scenes were available: 11035-15054 (LANDSAT 1, 

May 1975) and 30515-15201 (LANDSAT 3, August 1979). The LANDSAT 1 scene was 
chosen because it was available in the old CCT format and could be destriped 
with EarthSat's scan line suppression software. The LANDSAT 3 scene was avail- 
able only in corrected MDP format which cannot be destriped because the geo- 
l metric correction process destroys the detector identities. 

EarthSat's LANDSAT preprocessing program, CCTRFM, performs 3 functions: 
reformatting, scan line suppression and geometric correction, in a 2-pass pro- 
? cedure. In the first pass the four spectral bands are separated into different 

files, and the four vertical strips are pieced together. At the same time, the radio 
metric calibration introduced by NASA is removed and a histogram is acquired for 
$ each of the 24 detectors. Four calibration lookup tables are computed in order 

to match the six detectors for each spectral band. In the second pass, the 
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image is recalibrated and resampled (nearest neighbor) to correct for mirror 
velocity profile, earth curvature and panoramic distortions. Synthetic pixels* 
extra pixels inserted by NASA to attain consistent line length, are deleted 
during this pass. 

The corrected LANDSAT tape generated by CCTRFM was then used as input to 
the classification and registration steps. 

2. 2. 2. 2. 2 Classification 

Classification of the LANDSAT digital data was performed on the data general 
NOVA 840 of the Space Sciences Laboratory of the University of California at 
Berkeley, with program CLUSTER. It employs the ISODATA algorithm developed by 
Ball and Hall, and is about five generations removed from the original program 
(Ritter and Kaugars, 1978). 

An envelope for Kershaw County was defined using the interactive image 
processing system, and an initial clustering was performed on a 4% sample of 
the points in the envelope. 

The maximum standard deviation for the splitting of a cluster was set at 
0.6, and the minimum distance for combining clusters was defined at 3.2. The 
run terminated cleanly without the iterative splitting and combining that 
occurs at times, with an average cluster standard deviation of 0.3779, and an 
average intracluster distance of 15.04. A total of 14 clusters was generated. 
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TABLE 3 


Percentage of Points in 
Kershaw County Envelope 
Assigned to Each Cluster 


Cluster . 

Per Cent 

1 

0.3 

2 

25.5 

3 

22.2 

4 

19.3 

5 

6.7 

6 

7.9 

7 

4.0 

8 

0.6 

9 

2.2 

10 

8.6 

11 

0.7 

12 

0.8 

13 

0.5 

14 

0.3 


Table 3 lists the clusters and the percentages of points assigned to each. 
It is interesting to note that clusters 2, 3 and 4 contain almost 70% of the 
points in the sample. 

After the initial cluster run, the program was restarted at a later date, 
and all points in the area were assigned a cluster number. An output tape with 
the resultant image was forwarded for processing to the EarthSat, Washington, 





In the following, we will refer to the cluster and its associated points 
as a spectral class. By means of an inspection of the classification image. 


initial land use assignments were made for each spectral class. 

TABLE 4 


Initial Land Use 
Cluster Assignments 


Cluster 

Land Use Category 

1 

Bare Soils 

2 

Hardwood Pine Mix 

3 

Bottom-land Hardwoods 

4 

Pine Hardwood Mix 

5 

Cropland 

6 

Pine 

7 

Bare Soil 

8 

Wetland ' 

i 

j 

9 

Bare Soil j 

i 

10 

i 

Cropland 

11 

Water ; 

12 

Bare Soil 

13 

Water i 

14 

Bare Soil 


The number of spectral classes was judged too high for further analysis, and 
so, after the SSU proportions were extracted, spectral classes were combined 
the basis of their intracluster distance, the number of points in each class. 


and the initial land use class assigned to the cluster. 

Two sets of classifications were ultimately used, as shown in Table 5, 
one with 7 classes and one with 10 classes. We will refer to these classifies- 
tions as the 7- and 10-class spectral classifications. 


TABLE 5 

Composition of Spectral 
Classifications Used in 
ICLS Estimations 



7 Spectral 
Classes 

10 Spectral 
Classes 

Spectral 

Class 

Cluster(s) 

Cluster(s) 

1 

1, 12, 14 

U 12, 14 

2 

2 

2 

3 

3 

3 

4 

4 

4 

5 

5, 7, 9, 10 

C 

•J 

6 

6 

6 

7 

8, 11, 13 

7, 9 

8 


8 

9 


10 

10 


11, 13 
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2. 2. 2. 2. 3 Image Registration 

In this step, ground control points (GCP's) were located to develop a 
relationship between LANDSAT coordinates (line, column) and the UTM coordinate 
system (northing, easting). 7^-minute maps were obtained for Kershaw County 
and the surrounding area. Unfortunately, a large section of the county is 
covered only by a 15-minute sheet (Camden) which was last updated in 1939. 

This was unacceptable for our purposes, so no ground control was available for 
that area of the count,'. 

Five GCP's were located on the 7^-minute maps and the LANDSAT image. Since 
the study area covered about 1/9 of a LANDSAT scene, this was equivalent to 
45 GCP's for a full scene. EarthSat's program SHADE was used to produce line 
printer maps of those portions of the LANDSAT image (band 5) which contained 
the control points. The line and column coordinates for each point were meas- 
ured from these line printer maps. UTM coordinates were measured from the 
U.S.G.S. maps. 


Ground Control Points 
LANDSAT UTM 


Line 

Column. 

N 

I 

1471 

371 

3,810,874 

520,071 

1126 

116 

3,840,521 

512,163 

1094 

167 

3,842,405 

515,627 

1711 

659 

3,789,234 

531,929 

1367 

193 

3,821,109 

511,790 


These points were entered into EarthSat's program AFFINE to determine the 
affine coefficients which would map them with the minimum RMS error. The affine 


transformation has the form: 


X2 = V-|*X] + V2*Y| + V3 
Y 2 = V 4 *X-| + Vg*Y 1 + V 6 


where X-j,Y-| is the location in coordinate system 1, X2-, Y2 is the location in 
coordinate system 2, and through Vg are the coefficients. The actual trans- 
formation used was: 

UTM N = -77. 095*LINE - 11.468*C0LUMN + 3928650 
UTM E = -19.565*LINE = 57, 590*C0LUMN + 527457 
This transformation yielded a 67-meter RMS mapping error at the control 
points. 

2. 2. 2. 2. 4 SSU Proportion Extraction 

With a suitable mapping transformation defined, it was then possible to 
locate the 60 SSU' s on the classified LANDSAT image and calculate the class 
proportions. 

EarthSat's program RESAMP was used to resample and extract each SSU. The 
data were resampled to a 20-meter UTM grid using a nearest neighbor algorithm. 
The 2J-meter grid was chosen over a coarser grid to allow for a more accurate 
splitting of border pixels (a 20 x 20 meter cell is approximately 1/12 of a 
LANDSAT pixel). Each resampled SSU consisted of an 80 x 80 array of class num- 
bers. EarthSat's program COUNT was used to count the occurrences of each class 
and to convert them into proportions. A color coded map of a resampled SSU 
classification image is shown in Figure 7. This is the identical SSU shown in 





Figures 2, 3, 4, 5 and 6. 


ft 

S 

2. 2. 2. 2. 5 Digitizing of County Boundary 

S The county boundary was digitized from U.S.G.S. 15-minute quadrangle maps, 

using EarthSat's program TALOS. Files produced for each quadrangle were edited 
and merged for input to program CPOINT, a LANDPAK auxiliary program. CPOINT 
« was used to convert the machine coordinates to UTM coordinates using a projective 

transformation computed from digitized map control points. 

The file of UTM coordinates was then input to program KCC to produce a 
I. simulated binary image to be used as a mask for the classification image. A 

tape with this county boundary image was forwarded to EarthSat's Washington, D.C., 
offi ce. 

2. 2. 2. 2. 6 County Proportion Extraction 

In this step the digitized county boundary was logically combined with the 
classified LANDS AT image to generate class statistics for the entire Kershaw 
County area. RESAMP was again used, this time to resample the entire classified 
image to the 50-meter UTM grid of the county boundary file. EarthSat's program 
n CM8HST was used to compute statistics for those LANDSAT pixels which fell within 

the Kershaw County boundary. 

$ 2. 2. 2. 2. 7 Map Generation 

The map generation capability of the upper level GIS was used to create a 
LANDSAT classification map of Kershaw County. EarthSat's program COMBIN was 
© used to combine the resampled class map with the county boundary map, masking 

out all pixels which lay outside the county. The masked class map was then 
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processed by EarthSat's program OPTRONIC, to assign a color to each class and 
create three annotated film recorder compatible tapes (one tape for each primary 
color). An Optronics film recorder was used to create three black and white 
transparent -s from these tapes. EarthSat's photo lab composited the trans- 
parencies to produce a color negative and color photomaps at various scales. 
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2.2.3 Analysis 

The most significant results of the estimation computation of the Phase IA 
test are presented in this section. Three primary categories of variables are 
of interest: namely, land use, current annual increment, and soil erosion. This 
section is divided into three corresponding subsections, and for each of these 
we will discuss the Step 1 and 2 computations. 

2.2.3. 1 Land Use, Step 1 

Six land use classifications were defined at the secondary level. They are 
summarized in Table 6. 

The first is a land use classification with three broad forest type-classes. 
The second has three additional forest type-classes obtained by subdividing the 
original classes according to tree size. The third classification is similar, 
but forest classes are split according to tree densities. In the fourth classi- 
fication, the three basic forest types are split two ways: by tree density 

and by tree size. For this classification, all other non-forest classes are 
collapsed into three: namely, wetland, disturbed and other. The fifth and 

sixth classifications were made to test the notion that better estimates can 
be obtained for any given class when the class is treated by itself rather 
than in conjunction with a number of other classes. 

All the classifications are referred to as land use classifications; 
specifically, we will refer to each individually as indicated at the top of 
Table 6: namely, land use, forest types 1, forest types 2, forest types 3, 

grass and water. 
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TABLE 6 


Secondary Land Use Classifications 


Class 

Land 

Use 

Forest 
Types 1 


Forest 
Types 3 

Grass 

Water 

1 

FC 

FCL 

FCS 

FCLS 

GR 

W 

2 

FH 

FCH 

FCB 

FCLB 

NG 

NW 

3 

FM 

FHL 

FHS 

FCHS 



4 

AG 

FHH 

FHB 

FCHB 



5 

WL 

FML 

FMS 

FHLB 



6 

WT 

FMH 

FMB 

FHHS 



7 

GR 

AG I 

AG I 

FHHB 



8 

UR 

AGN 

AGN 

FMLS 



9 

DS 

WL 

WL 

FMLB 




BS 

WT 

WT 

FMHS 





GR 

GR 

FMHB 





UR 

UR 

WL 



mM 


DS 

DS 

DS 



14 


BS 

BS 

OT 




Meaning cf first two characters : 

FC = Forested Conifer 
FH = Forested, Hardwood 
FM = Forested, Mixed 
AG = Agriculture 
GR - Grass 
UR = Urban 
DS = Disturbed 
BS = Bare Soil 
l 


S> 


NG = Non-Grass 
WW = Non-Water 
GT = Other 

Meaning of third and fourth char acters: 

L = Low Tree Density 
H = High Tree Density 
S = Small Trees 
B = Big Trees 
I = Irrigated 
N = Non- Irrigated 













Two runs were made with program ESTPRP, one for each of the spectral 
classifications mentioned in Section 2. 2. 2. 2. 2. The county proportions of the 
land use classes for each of these runs are listed in the left-hand side of 
Table 7. Proportions for the same classes were also computed from the sample 
set with the estimator a av (Section 2. 1.2.1). These proportions are reported 
in the right-hand side of Table 7. It can be seen in this table that the esti- 
mator a av , for which the estimated proportions are the SSU averages, is a 
fairly good estimator compared with a pr , which makes use of LANDSAT. The stan- 
dard errors for a pr are fairly uniform for all classes, whereas for a av the 
standard errors vary proportionately v/ith the class proportions. For the larger 
classes, a pr seems to be superior to a av ; however, the standard errors are small 
to start with, so that an efficiency increase obtained with LANDSAT amounts to 
only a few per cent in increased accuracy. 

One interesting effect that can be observed in Table 7 is that a av estimates 
water at 3.24%. The LANDSAT estimator a pr » however, reduces this estimate to 

I. 72%, indicating that the sample is not representative of the entire county. 

The probable reason is that one SSU is situated on the Wateree Reservoir. 

The correlation coefficients are surprisingly high for Landsat-based 
analyses. The highest correlations obtained in an earlier study relating 
LANDSAT data to ground data (van Roessel, 1976) was 0.67. The F values of 

II. 68 and 8.29 must be compared with Fg.gs for the indicated degrees of free- 
dom: namely, 1.32 and 1.25, respectively. The linear relationships are, there- 
fore, highly significant, even satisfying a criterion mentioned by Draper and 
Smith (1966), namely that for a useful prediction, the value of ? must be at 
least four tines the critical value. 



TABLE 7 

Land Use Proportion Estimates 
For Kershaw County 


Estimator: a. 


7 Spectral 
Classes 


10 Spectral 
Classes 


Proportion Standard {Proportion Standard 

(Percent) Error | (Percent) Error 


33.03 

6.71 
28.74 
14.36 

5.84 

1.72 
6.33 
1.92 
1.25 
0.05 



33.44 

6.89 

29.11 

12.04 

5.88 

1.82 

6.77 

2.50 

1.37 

0.13 


0.7677 

11.68 

(63,477) 


0.7740 

8.29 

(90,450) 


Estimator: a. 



Proportion 

(Percent) 

Standard 

Error 

32.24 

2.85 

6.23 

1.02 

28.36 

2.14 

13.83 

2.24 

5.78 

1.25 

3.24 

1.56 

6.60 

0.99 

2.10 

0.80 

1.39 

0.33 

0.22 

0.12 












The standard errors in Table 7 can be used to construct individual con- 
fidence intervals for each class using a tg ggg value with approximately 
450 degrees of freedom: 1.96. However, one must not make a simultaneous 

interpretation of these intervals. 

It is interesting to note that the smaller number of spectral classes pro- 
vides as good a correlation as the larger number, with a higher F value. 

2. 2. 3. 1.2 Forest Types 1 

Again, two runs were made with ESTPRP, one for each spectral classifica- 
tion. The results are shown in Table 8. The additional breakdown of the 
forest classes by tree size does not seem to be meaningful, as the correla- 
tion is down by .1 in both cases, and the F values are considerably less than 
those of Table 7. This makes sense when considering that the LANDSAT signa- 
tures are probably related more to tree density than to tree size. 

2. 2. 3. 1.3 Forest Types 2 

Here the additional breakdown is by tree density, rather than by tree size. 
Results are shown in Table 9. Only one spectral classification was used. The 

Or 

standard errors are generally lower, and the correlation coefficient and F 
value are considerably higher than those for the breakdown by tree size. 

2. 2. 3. 1.4 Forest Types 3 

The original three forest types FC, FH and FM are all divided by tree 
size and tree density, yielding twelve different classes. However, one class 
has been eliminated due to the lack of any occurrence for this class in the 
aerial photo sample. The total absence of a class in the SSU proportions 
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TABLE 8 


Forest Types 1 Proportion 
Estimates for Kershaw County 



Estimator: 

pr 

Estimator: 2 

aV 


7 Spectral 
Classes 

10 Spectral 
Classes 


Land Use 1 
Class | 

Proportion 

(Percent) 

Standard 

Error 

Proportion 

(Percent) 

Standard 

Error 

Proportion 

(Percent) 

Standard 

Error 

FCL 

16.20 

1.25 

16.81 

1.25 

14.38 

2.56 

FCH 

17.53 

1.24 

17.85 

1.24 

17.86 

2.10 

FHL 

0.05 

1.17 

0.09 

1.17 

0.12 

0.10 

FHH 

6.51 

1.27 

6.72 

1.27 

6.11 

1.03 

FML 

2.75 

1.25 

2.86 

1.25 

2.68 

0.69 

FMH 

25.92 

1.29 

26.21 

1.29 

25.67 

2.00 

AG I 

14.08 

1.09 

11.88 

1.22 

13.80 

2.24 

AGN 

0.00 

0.00 

0. 00 

0.00 

0.03 

0.02 

WL 

5.79 

1.21 

5.84 

1.20 

5.78 

1.25 

m 

1.68 

1.17 

U75 

1.17 

3.24 

1.56 

GR 

| 6.23 

1.29 

6.44 

1.30 

6.60 

0.99 

UR 

1 1,87 

1.23 

2.08 

1.22 

2.10 

0.80 

DS 

! 1.22 

1.17 

1.27 

1.17 

1.39 

0.33 

BS 

0.11 

1.17 

0.16 

1.17 

0.22 

0.12 

R 

0.6813 


0.6877 



F 

7.14 


5.11 




D.F. 

(91,639) 


(130,650) 
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causes singularity in the X'X matrix. The non-forest classes were compressed 
into three categories: wetland, disturbed and other. The estimated propor- 

tions for the classification are shown in Table 10. Note that the bulk of the 
forest category is divided between two classes, FCHB and FMBH. 

2. 2. 3. 1.5 Grass 

Because the evaluation of winter range is of specific interest for the 
Phase III test, it was thought to be of special value to determine the efficiency 
with which the grass category can be estimated by itself, with all the other 
land use classes lumped in a non-grass class. The experiment was also of 
general interest because it represents an extreme of the possible number of 
secondary classes. Results are shown in Table 11. 

Correlation coefficients and F statistics are extremely high; however, 
the standard error is somewhat higher than in the previous cases where grass 
was estimated in conjunction with other classes. The reason for this effect is 
not clear at present. To verify that the high values are not the result of an 
induced correlation present when working with proportions (Chayes and Kruskal, 
1966), a random set of proportions of identical size to those used for the 
grass category was generated and processed through program ESTPRP. The resul- 
tant correlation coefficient was 0.2676 and F was 0.59, correctly reflecting 
the random nature of the input data. 

It is interesting to inspect the P matrix for the ten spectral classes 
case. This matrix is shown in Table 12. 

Most of the grass proportions is due to spectral class 8 (54.66%), which was 
initially identified on the classification image as wetland. 


TABLE 10 


Forest Types 3 Proportion 
Estimates For Kershaw County 


Land Use 
Class 

7 Spectral Classes 


Estimator: ap r 

Estimator: a „ 
av 

Proportion 

(Percent) 

Standard 

Error 

Proportion 

(Percent) 

Standard 

Error 

FCLS 

0.40 

1.07 

0.46 

0.31 

FCLB 

2.67 

1.15 

2.45 

0.69 

FCHS 

6,39 

1.15 

5.30 

1.77 

FCHB 

24.12 

1.19 

24.02 

2.18 

FHLB 

0. uo 

0.00 

0.03 

0.02 

FHHS 

0.00 

0.00 

0.01 

0.01 

FHHB 

6.82 

1.17 

6.20 

1.03 

FMLS 

0.08 

1.13 

0.15 

0.11 

FMLB 

3.36 

1.14 

2.97 

0.70 

FMHS 

0.32 

1.10 

0.34 

0.19 

FMHB 

25.07 

1.19 

24.90 

2.05 

WL 

5.77 

1.09 

5.78 

1.25 

DS 

1.27 

1.13 

1.39 

0.33 

OT 

23.68 

1.10 

25.99 

2.78 

R 

F 

D.F. 

0.7635 

11.71 

(91,689) 




TABLE 11 

Grass Proportion 
Estimate For Kershaw County 



Estimator: a py( 


7 Spectral Classes I 10 Spectral Classes 


Estimator: a. 




Proportion 

(Percent) 

Standard 

Error 

Proportion 

(Percent) 

Standard 

Error 

| Proportion 
| (Percent) 

Standard 

Error 

6 0 63 

1.49 

6.52 

1.46 

6.60 

0.99 

93 0 32 

1 .49 

93.42 

i .4o 

93.40 

0.99 

0.9842 


0.9828 




285.21 


195.15 




(7,53) 


(10,50) 





TABLE 12 
Grass P Matrix 
























2. 2. 3. 1.6 Water 

The capability to distinguish water from non-water is also a major require- 
ment for MAIS. A similar analysis as that for grass was therefore undertaken 
for water. Results are shown in Table 13. 

Here the correlations are almost equal to unity, indicating an almost 
perfect correspondence with water as shown by the classification image 
and the aerial photo interpretations. This could be expected, as water is the 
land use class most discernible on LANDSAT images. However, the result demon- 
strates that a very good PSU-SSU registration was obtained. 

In this case, unlike the analysis for grass, the low standard error does 
seem to reflect the high correlation. The estimator ap r is clearly superior 
to a av . Again, Table 13 shows how LANDSAT introduces a global correction for 
the water proportion of Kershaw County, as contrasted with the proportion in the 
sample which is high because of an SSU located over the Wateree Reservoir. 

2.2. 3.2 Land Use, Step 2 

Once the secondary proportions have been obtained, another class projection 
can be applied, and proportion estimates and corresponding covariance matrices 

i 

for ground classes can be obtained using the formilation of Section 2. 2. 2. 2.1. 
These are the estimates presented in the following sections. 

Two ground classification proportion estimates were attempted, one for 
general land use classes, the other for more specific forest type-classes. 

Class designations are explained in Table 14. 

2. 2. 3. 2.1 Ground Land Use Classes 

The results for the more general land use classes are shown in Table 15. 
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TABLE 13 

Water Proportion 
Estimate For Kershaw County 


Estimator: 

7 Spectral 

Classes 

10 Spectral 

Classes 

Proportion 

(Percent) 

Standard 

Error 

Proportion 

(Percent) 

Standard 

Error 

1.86 

0.11 

1.87 

0.15 

98.08 

o.n 

98.08 

0.15 

0.9994 


0.9993 


21 989 „ 08 


14707.36 


(7,53) 


(10,50) 



Estimator: a. 


j Proportion 

Standard 

| (Percent) 

Error 

f 

| 3.24 

1.56 

96.76 

1.56 










TABLE 14 


; i 


e 


t 


Ultimate Land Use Classification 
Definitions 


Class 

Land Use 

CF 

Commercial Forest 

CR 

Cropland 

IP 

Improved Pasture 

IF 

Idle Farmland 

OF 

Other Farmland 

UR 

Urban and Other 

WT 

Water 

Class 

Forest Types 

LLP 

Longleaf Pine 

SHP 

Hash Pine 

LBP 

Loblolly Pine 

SLP 

Short! eaf Pine 

PDP 

Pond Pine 

OYP 

Oak-young Pine 

OHI 

Oak-hickory 

SCO 

Southern Scrub Oak 

OGC 

Oak- gum Cypress 

EAC 

Elm-ash-cottonwood 

NC 

Not-commercial Forest 






TABLE 15 

Ground Land use Class 
Proportion Estimates 
For Kershaw County 


Estimator: us f r § pr 


Proportion 

(Percent) 

Stanuard 

Error 

CF 

72.28 

3.63 

CR 

13.28 

2.19 

IP 

4.74 

1.74 

IF 

2.16 

1.01 

OF 

2.69 

1.40 

UR 

2.70 

1.30 

WT 

. 2.03 

1.13 
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TABLE 16 


Ground Forest Types 
Proportion Estimates 
Kershaw County 


Estimator: g, using ap r 

H 

Proportion 
(Per Cent) 

Standard 

Error 

LLP 

2.44 

1.14 

SHP 

11.05 


LBP 

20.85 

3.33 ; 

SLP 

2.18 

0.96 ; 

PDP 

0.84 

0.58 

OYP 

10.49 


OHI 

11.07 

2.51 

SCO 

■ 3.85 

1 .26 

OGC 


1 .88 

EAC 

3.76 

1.37 

NC 

27.04 

2.92 
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The standard errors are generally larger than those obtained for the secondary 
classifications, because the estimates are computed from two sets of random 
variables. 

2. 2, 3. 2. 2 Ground Forest Type Classes 

The results for the forest types classification are shown in Table 16. 

The non-cornmercial proportion in Table 16 is the complement of the commercial 
forest category in Table 15. These figures total 99.32%, a good result 
considering that these tables were derived using separate processes and differ- 
ent groupings. 

2. 2. 3. 3 CAI , Step 1 

The first step in arriving at estimates of continuous variables such as 
CAI is to select a suitable classification. This classification serves as a 
stratification for the second step. Either a secondary or an ultimate classi- 
fication can be used. 

To compare results, both types are tested. The forest types 3 classifi- 
cation (Section 2. 2. 3. 1.3} and estimates in Table 10 are used for the secondary 
classification. The forest types of Section 2. 2. 3. 2 (Table 14) and the esti- 
mates in Table 16 are used for the ultimate classification. 

2. 2. 3. 4 CAI, Step 2 

The following kinds of estimates can be made for CAI (see Figure 2): by 

class, per acre; by class, total; by county, per acre; and by county, total. 

The per acre estimates by class are derived solely from the plot data. The 
"by class" estimates for the secondary classifications are presented in Table 17; 


TABLE 17 


CAI Estimates by 
Forest Types 3 Class 

(Cubic Feet) 


Class 

Per/Acre 

Standard 

Error 

Total 

Standard 

Error 

Percent 

FCLS 

30.0 

0.0 

60,135 

160,539 

266. 

FCLB 

21.4 

7.3 

285,862 

156,683 

54.0 

FCHS 

37.8 

21.2 

1,205,044 

709,645 

58.8 

FCHB 

93.9 

10.2 

11,322,034 

1,350,470 

11.9 

FHLB 

0.0 

0.0 

0 

0 

0 

FHHS 

0.0 

0.0 

0 

0 

0 

FHHB 

61.2 

9.2 

2,087,965 

475,726 

22.7 

FMLS 

0.0 

0.0 

0 

0 

0 

FMLB 

27.5 

23.3 

461,962 

422,100 

91.4 

FMHS 

0.0 

0.0 

o 

0 

0 

FMHB 

58.5 

6.7 

7,327,357 

906,993 

12.4 

WL 

60.1 

15.3 

1,733,242 

548,222 

31.6 

DS 

14.1 

7.5 

89,727 

92,952 

104. 

OT 

0.8 

0.6 

98,080 

66,425 

67. 8 


COUNTY TOTAL 

24,671,410 

1,865,234 

7.56 







those for the ultimate classification, Table 18. 

To ensure compatibility of the estimates, total estimates were made using 
the county acreage given in the U.S. Census Report of 1970. The same number 
was used by the Southeastern Experiment Station for its Forest Statistics of 
Kershaw County (C raver, 1978). A county area estimate was also obtained from 
the digitized boundary using program KCC. Another area figure was found in 
the Lockheed Ten -Ecosystem Study Final Report (Dillman, 1978). The different 
acreage figures are shown in Table 19. 

TABLE 19 

Kershaw County Acreage 


Source Acres 

U. S. Bureau of the Census 499,840 
Earth Satellite 501,283 
Lockheed 503,100 


Several other types of estimates for the total CAI were computed. All 
estimates are summarized in Table 20. 

One estimate was made with a av (no LANDSAT contribution) to obtain an 
idea of gain in efficiency due to the primary stage. This gain was estimated 
at (8.17 - 7.56)/8. 17 x 100% = 7.5%. 

The estimate obtained with the ultimate classification had a standard 
error approximately twice as large as the one computed with the secondary 
classification. This is due to the introduction of an additional set of 
random variables. Also, a stratification by species does not seem too mean- 
inful when considering growth. A stratification by site class would be more 
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TABLE 18 


CAI Estimates by 
Ultimate Forest Type 
Class 

(Cubic Feet) 


Class 

Per/Acre 

Standard 

Error 

Total 

Standard 

Error 

Percent 

LLP 

22.1 

11.2 

269*645 

185,421 

68.7 

SHP 

98.7 

18.7 

5,449,095 

1,648,118 

30.2 

LBH 

94.4 

8.9 

9,834,346 

1,823,546 

18.5 

SLP 

81.0 

22.0 

880,818 

457,549 

51.9 

PDP 

94.0 

27.0 

392,686 

295,973 

75.4 

OYP 

40.1 

7.3 

2,101,352 

613,955 

29.2 

OH I 

46.7 

7.6 

2,582,602 

720,295 

27.9 

SCO 

11.2 

2.3 

215,796 

83,500 

37.7 

OGC 

45.0 

8.6 

1,350,130 

495,670 

36.7 

EAC 

73.0 

14.0 

1,373,063 

563,622 

41.0 

NC 

0.0 

0.0 

0 

0 

0 

COUNTY TOTAL 

24,449,534 

3,858,145 

15.78 










TABLE 20 


Total CAI Estimate Summary 
Kershaw County 


Type of Estimate 

Total 

Standard 

Error 

Percent 

Stratification with Secondary Forest 
Types 3 Classification using a pr 

24,571,410 

1,865,234 

7.56 

Same with § av (without LANDSAT) 

24,134,085 

1,970,590 

8.17 

Stratification with Ultimate Forest 
Types using S pr > g 

24,449,534 

3,858,145 

15.78 

Plot data used as Simple Random 
Sample 

26,474,90 

8,749,250 

33.05 

South Carolina '78 Forest Statistics 

24,435,000 

1,771,540 

7.25 







appropriate. 

To compare the estimates obtained with a av , g and eip r , which are based on 
the GIS technology, with a simple random sampling estimator, a fourth estimate 
was computed from the plot data by averaging the CAI's for each plot and multi- 
plying this average with the county area. The standard error of this estimate 
is approximately four times the one obtained with the secondary forest types 3 
stratification, thus dispelling any doubt that the employed technology does 
contribute to the sampling efficiency for estimating current annual increment. 

The final estimate in Table 20 is the one reported in the publication, 

"Forest Statistics for the Northern Coastal Plain of South Carolina" (Graver, 
1978): Table 8, column 6, Kershaw County. The corresponding standard error 

(7.25%) was obtained from the table on page 5 of the same publication. This 
figure supports the hypothesis that Phase IA estimates made with the proto- 
type MAIS system are of the same quality as those obtained with current practices. 

2.2. 3.5 Soil Erosion, Step 1 

The derivation of the secondary classification for erosion potential is 
given in Section 2. 2. 2. 1.5. A sample set of 12 SSU's was created with seven 
erosion potential classes. This set was input to program ESTPRP, together 
with th° 7 class spectral classification proportions. The estimated propor- 
tions in each erosion potential class are shown in Table 21. 

The correlation coefficient is high, but the F statistic barely satisfies 
the four times critical value criterion of Draper and Smith (1966), at the 
0.05 level (8.39 = 4.69 x 1.71). These effects are due to the small sample 
size. Given that enough coefficients are estimated in relation to the number 


TABLE 21 


Erosion Potential Class 
Proportion Estimates 
& For Kershaw County 

{Percent) 


7 Spectral Classes 


Erosion 

Potential 

A f 

Estimator: 

Estimator: & 

o V 

Class 

Proportion 

1 Standard | 
I Error } 

Proportion 

Standard 

Error 

ER i 

54.95 

4.33 

58.43 

5.47 

er 2 

8.77 

3.81 

6.01 

1.66 

er 3 

24.99 

3.96 

27.30 

5.33 

er 4 

6.51 

3.92 

4.66 

1.79 

er 5 

4.25 

3.92 

3.27 

1.87 

ER 6 

0.08 

3.22 

7 

0.05 

er 7 

0.38 

3.22 

0.28 

0.11 


R 

F 


D.F. 


0.9502 

8.39 

(42*30) 












of observations, a correlation coefficient can always be forced to unity. As 
shall be seen in the next section, this situation is reflected in a large 
sampling error for the county total. 

2. 2. 3. 6 Soil Erosion, Step 2 

The class midpoint was assigned as the continuous variable for each erosion 
potential class. These per acre estimates, as well as the total estimates by 
class and the total estimates for Kershaw County, are shown in Table 22, Both 
the estimators a av and a pr were tested. 

The difference between the county total standard errors for the estimators 
using either a pr or a av (LANDSAT regression and area photo averages) is striking. 
The difference is due to the high standard errors of the high potential erosion 
classes for the a pr estimator as indicated in Table 21, It is another sign of 
the marginal performance of regression techniques using an inadequate sample. 

In this situation, it seems that one is better off using the secondary sample 
proportions alone. It is hoped that the same test can be performed during 
Phase II with a larger sample size. It is also possible that a rs may perform 
better than a pr in the case of a limited number of of observations. The total 
county estimates for both techniques conform closely. 

2.2.3. 7 Map Legends 

One of the unique aspects of the ICLS regression approach is that the 
estimated class transformation matrix can be used to construct a map legend of 
the primary map in terms of a secondary classification. Table 23 shows the 
P matrix to transform from seven spectral classes to ten secondary land use 
classes. Using this matrix and a convention of reporting the three major 
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TABLE 22 


Erosion Potential Estimates 
For Kershaw County 

(Tons/Year) 


Erosion 

Potential 

Class 


Estimator g, using a^ 

Estimator: g, using & „ 
3 a v 

Per/Acre 

Total 

Standard 

Error 

Percent 

Total 

Standard 

Error 

Percent 

ER i 

1 

274,686 

21,631 

8.0 

292,035 

27,342 

9.0 

er 2 

3 

131,551 

57,096 

43.0 

90,071 

24,867 

28.0 

®3 

6 

749,552 

118,725 

16.0 

818,637 

160,329 

20.0 

er 4 

12 

390,645 

235,377 

60.0 

279,260 

107,401 

38.0 

ER 5 

24 

510,059 

470,754 

92.0 

392,374 

223,972 

57.0 

ER 6 

48 

20,096 

772,579 

3844. 

15,995 

11,040 

69.0 

er 7 

96 

180,215 

1,545,158 

857. 

133,557 

51,522 

39.0 

COUNTY TOTALS 

2,256,805 

1,486,338 

66.0 

2,021,931 

337,965 

17.0 




TABLE 23 

Class Transformation Matrix 



FC 

FH 

FM 

AG 

WL 

WT 

GR 

UR 

DS 

BS 

ALN 

1.0001 

0.0000 

0,0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0 0000 

0.0000 

B 

. 0.5807 

0.0000 

0.2082 

- 0.0000 

0.1888 

0.0000 

0.0000 

0.0000 

0.0224 

- 0.0000 

C 

- 0.0000 

0.1958 

0.4050 

0.2250 

0.0426 

0.0086 

0.0889 

0.0000 

0.0322 

0.0020 

D 

0.6834 

0.0193 

0.2907 

0.0000 

- 0.0000 

0.0000 

0.0067 

- 0.0000 

0.0000 

0.0000 

EGIJ 

0.1107 

- 0.0000 

0.1384 

0.5049 

0.0070 

0.0000 

0.1715 

0.0669 

- 0.0000 

0.0007 

F 

0.0000 

0.2439 

0.5939 

- 0.0000 

0.0000 

- 0.0000 

0.1013 

C .0551 

0.0059 

- 0.0000 

HKM 

- 0.0000 

0.0000 

0.1079 

0.0000 

0.0000 

0.8922 

0.0000 

0.0000 

- 0.0000 

- 0.0000 


Cluster Composition for Spectral Classes 
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secondary classes for each primary class, one can assign a legend to each 
cluster color on the primary classification map, as shown in Table 24. This 
legend is for the LANDSAT classification map (also see Figure 7) submitted as 
a deliverable product for this project. To rtake this legend, a cluster-color 
assignment table was used. Twelve colors are shown on this map. Since only 
seven spectral classes are used in the analysis, a simplified map of seven 
colors could be made with a simpler legend. The legend of Table 24 is only 
a preliminary product and with a possibility for feedback, a much better map 
legend can be produced. 

<a 
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TABLE 24 


Legend for Landsat 
Classification Map 


Color 

Land Use Composition 

Brown 

100% Forest Conifer 

Dark Green 

58% Forest Conifer, 21% Forest Mixed, 
19% Wetland 

Light Green 

40% Forest Mixed, 23% Agriculture, 
20% Forest Hardwood 

Orange 

68% Forest Conifer, 29% Forest Mixed 

Red 

51% Agriculture, 17% Grass 

14% Forest Mixed, 11% Forest Conifer 

Purple 

60% Forest Mixed, 24% Forest Hardwood, 
10% Grass 

White 

See Red 

Light Blue 

89% Water 

Black 

See Red 

Light Blue 

See Red 

Blue 

89%' Water 

Yellow 

Sc-e Brown 

Blue 

89% Water 

Grey 

See Brown 




3.0 SUMMARY 


A prototype system test was undertaken to asses the workability of the 
proposed MAIS design. The most important aspect in assembling a system from 
a set of components is the integration of the components into a workable entity. 

It was realized that for MAIS, the vital link in this process is the "estimation 
subsystem", and hence the evaluation effort was directed at trying the pro- 
posed techniques for this subsystem in a set of preliminary tests for one county. 

The results seem to support the notion that the basic scheme works very 
well. Estimates which heretofore were impossible to make using conventional 
methods can be made using GIS and LANDSAT techtiology (erosion potential). Con- 
ventional estimates can be made with the same accuracy as current methods, 
hopefully at reduced cost. The proposed techniques seem to be both robust and 
flexible. High LANDSAT classification accuracies are by no means required, and 
using the class transformation concept, one can produce a wide variety of esti- 
mates to satisfy many needs. Valuable insight was gained into a possible structure 
for a permanent estimation subsystem. An automatic file handling system along 

the lines of the transaction concept outlined in the concept development docu- 
ment is highly recommended. Also, an automated report generator must be included 
in a future estimation subsystem. 

In conclusion, it seems that the major reservations concerning the proposed 
techniques have been eliminated. Some problems remain due to time and resource 
constraints. It is hoped that they can be addressed in the Phase II effort. 
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1.0 GENERAL APPROACH 


In Volume I (Multt resource Inventory Design and Sampling Network) 
and in Volume II (MAIS Concept Development) a number of techniques have 
been proposed for use on the Pilot Test Implementation. These techniques 
are at many different stages of development. Some have been widely 
applied in the past, so their value and limitations are well -documented 
in at least some geographic areas and application areas. Others are 
proposed that have not, to our knowledge, been applied before in the 
context of a multi resource inventory. 

In this volume, we address two questions: 

* What is the level of maturity of each technique that will be 
employed in the Pilot Test Implementation? This Includes 
experience with the method, the way that prior experience can 
be related to the proposed use on the Pilot Test, and the way 
that the technique has been embodied in available software and 
hardware. 

* How far does the state-of-the-art in each component used on 
the Pilot Test match the perceived needs of the project? This 
implies a comparison of requirements against capabilities, and 
this comparison must be at least partially a subjective one. 

Following a component-by-component analysis that examines the two 
questions given above, research and development needs can be stated. 

When the state-of-the-art falls short Gf the needs, the work required to 
improve that state-of-the-art can in many cases be identified. Thus, 
component evaluation leads naturally to suggestions for areas of develop- 
ment that should be pursued by research activities in support of perfor- 
mance of the Pilot Test. These research activities are themselves 
outside the scope of the Pilot Test Implementation; realistically, their 



results may not be available in time to be useful in the South Carolina 
Test. There is a possibility that new techniques can be developed to 
apply to the Western State Test. In addition to identifying research 
areas and giving subjective assessments of the magnitude of the effort 
needed, we will also attempt to scope the length of time that may be 
needed to convert research work to a useful new tool for real inventories. 


2.0 SYSTEM REQUIREMENTS 


Two sets of elements must work together if the Pilot Test is to be 
performed successfully. These consist of the data sources ana the 
processing components that manipulate them, where the latter will 
embody the mathematical models. The processing components must contain 
a capability to handle all major data types. These data types consist 
of: 

• Landsat data, in image and CCT format 

• NCIC digital terrain data in v ape format 

• Collateral data in map format 

• Aerial photography, in resource photography or optical bar 
format 

• Attribute (point) data 

The processing components that manipulate these data types consist 
of the following: 


2 . 1 Upper Level GIS Component s 



INPUTS 

COMPONENT 

OUTPUTS 

A. 

Old Format 

Preprocessor 

Band Sequential, 


Landsat Data 


Decal ibrated. Synthetic 
Pixels Deleted, Data 




Drops Filled 

B. 

Output of A 

Scan Line 

Same Format, Scan Line 


Suppression 

Removed 

C. 

New Format 

Preprocessor 

Band Sequential Landsat, 


Landsat P Tape, 
Band Sequential 


Data Drops Filled 
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INPUTS 

COMPONENT 

OUTPUTS 

f 

D. 

Output of B or C 
User Input 
UTM Base Maps 

GCP Location 

List of Control Points 


E. 

Outputs of B & D 

Landsat Specific 
Geometric Correction 

Landsat Band Sequential 
in 50 Meter UTM 

£ 

F. 

NCIC Digital 
Terrain File 

Preprocessor 

Elevation File, Map 
Control Points 


G. 

Output of C & D 
Or F 

General Map 
Transform 

Input File at UTM 50 
Metre 


H. 

Elevation File 
From G 

Slope Calculation 

Slope Class File 


I. 

Elevation File 
From G 

Aspect Calculation 

Aspect File 


J. 

Map Input 

Digitizer Interface 

Line Segment 
Coordinate Chains 


K. 

Output of J 
User Inputs 
Polygon Attributes, 
Control Points* 

Arc-Cell Convert 

i 

50 Metre UTM Cell 
File of Map Input 


L. 

Outputs From E, 
G, H, I, K 

Map Unit Extraction 

Map Unit Files in Data 
Base Format 

* 

M. 

Multiple Data Layers 
# of Classes Desired 

Cluster 

Class Map 
Class Means 
Class Variance 


N. 

Multiple Data Layers 
Training Areas 

Unsupervised 

Classification 

Class Map 
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INPUTS 

COMPONENT 

OUTPUTS 

0 . 

Multiple Data Layers 

Boolean Combination 

Resultant Data Layer 

p. 

Multiple Data Layers, 

Linear Combination 

Resultant Data Layers 


Combination Coefficients 



Q. 

Single Layer, 

Area Tabulation 

Acreage Totals for 


Or Single Layer Plus 


Classes for Entire Layer 


Boundary Layer 


or Within Selected 




Boundaries 

R. 

Multiple Layers 

Statistical 

Statistics (Mean, 



Processor 

Variance, Covariance 




Cross-Correlation) 

S. 

Single Layer, 

Map Processor 

Chloropleth Map 


Desired Scale 


of Data Layer 


Color Assignments 



2.2 Lower Level GIS Components 


INPUT 

PROCESS 

OUTPUT 


* 


t 


A. Arc Extraction 

Maps, Aerial Photos 
(Optical Bar} 


1. Thinning Digitized Arcs in 

Standard Format 

2. Conversion to UTM 
Coordinates 

3. Conversion to 
Standard Format 

4. Scale Change and 
Adjustment 

5. Photogrammetric 
Adjustment 
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INPUT 

PROCESS 

OUTPUT 

B. 

Arc Polygon Conversion 




Arcs in Standard 
Format 

1. Converting Arcs- 
Polygons 

Polygons and Areas 
in Standard Format 



2. Area Calculation 


C. 

Attribute Data Entry 




Attribute Data From 
Forms and Maps 

1. Interactive Attri- 
bute Entry 

Attribute Records in 
Standard Format 



2. Bit Packing of 
Data Item Values 


D. 

Slope/Aspect Generation 




Contour Line Areas 
In Standard Format 

1. Generation of Slope 
and Aspect Maps 

Slope and Aspect 
Polygons in Standard 
Format 

E. 

Data Base Insertion 




Arc and Polygon 
Records in Standing 

1. Insert Ready Images 
Into Data Base 

Data Base 

F. 

Data Base Retrieval 




Data Base 

1. Obtain SSU Images 
From Data Base 

SSU Image Records 
In Standard Format 

G. 

Command Language 




Analyses and Retrieval 
Problem Formulation 

1. Compile Routine 
Executable Code 

Executable Code 

H. 

Attribute Search 




SSU Image Records 

Filtering of SSU Image 
Subareas for Selected 
Attributes 

SSU Image Records 
With Desired Attributes 
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INPUT 


PROCESS 


OUTPUT 


I. Zone Generation 

SSU Image Records Generation of Zones SSU Image Zone Records 

With Desired Attributes Around SSU Sub-Units 


J. Overlay 

SSU Image Records Boolean Combination SSU Image Records, 

of SSU Layer Images Derived Layers 


K. Report Generation 
SSU Image Records 


Retrieval and Sorting 
of SSU Sub-Unit 
Attributes 


Reports, Attributes for 
Correlation with Upper 
Level GIS Information 


L, Map Generation 
SSU Image Records 


1. Spatial Display and Maps for Visual 
Assembly of SSU Inspection Screen 

Images Displays 


M. Updating 

SSU Images Data Base Process SSU Changes Updated Data Base 

Against Existing SSU SSU Images 

Images to Obtain SSU 

Images 


2.3 Additional Project Components 


A. Sample Assignment 

Provides the distribution of data samples needed for estimation of 
resource variables. 


- 7 - 


r 


B. Map Regression Relationships 

Develops the correlations between aerial photo and Landsat 
data for both continuous and non-continuous variables. 

C. Landsat Mapping Extension 

Permits the use of Landsat data as a mapping tool that can 
extend resource information to areas not covered by aerial photos 
and ground samples. 

D. Photointerprati ve Keys 

Provides the set of interpretive keys chat allow Landsat scenes 
to be classified to multiple resource classes via manual photographic 
analyses. 


3.0 WORKING STATUS OF GIS COMPONENTS 


Not all components listed in the previous section have reached the 
same level of maturity in either concepts or in practical implementation. 

In this section, a subjective evaluation is offered of the . of 
development of each component. In Section 4 the useful research activities 
that may apply to certain selected system components are described. 

3.1 Upper Level GIS Components 

3.1.1 Working versions of components A through G are known to be 
available, although the capability to fill in dropped data lines 
from Landsat data have not been added to them. (Programs exist for 
DEC PDP computers at Goddard Space Flight Center, and for IBM and 
PRIME computers at Earth Satellite Corporation; other versions also 
undoubtedly exist at ERIM.) 

3.1.2 Working versions of the slope and aspect calculations (H 
and I) exist, though not in the format conceived for use in the 
Pilot Test. Minor modifications of the presently working software 
will take care of this {programs exist on the A0IPS system at 
Goddard Space Flight Center). 

3.1.3 Working versions of components J and K exist for the PRIME 
computer, and also in the case of component K for the IBM 360 
computers (see CACM, 22, 518; 1979). 


I 


l 


n. 

**- 


I 


I 
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3.1.4 Working versions of component L exist on several different 
computers, including the IBM 360/158 and sister machines (Purdue 
University, Johnson Space Center). 

3.1.5 Working versions of components M and N exist at a wide 
variety of installations, and with a variety of options (Bayes 
estimation, maximum likelihood, binary classifiers, etc.). However, 
there is a definite lack of information about reliability of 
classification results in mixed-resource environments. This is 
particularly true if collateral variables are included in the 
classification process. 

3.1.6 Working versions of 0. exist for the IBM, DEC, Honeywell, 
and UNIVAC equipment. 

3.1.7 Working versions of P through S exist on the PRIME, DEC, and 
IBM 360 computer systems. 

SUMMARY : The functions of the upper level GIS present no real 

problems for implementation of Phase II of the Pilot Test. All 
components have been developed already. The major uncertainties 
arise from the question of accuracy of results obtained in some 
components. 

3.2 Lower Level GIS Components 

All components of the lower level GIS exist in FORTRAN in 
working form on at least one computer system (PRIME). 
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3.3 Additional Project Components 


l 




<* 


it 
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3.3.1 A. Sample Assignment 

This does not currently exist in the form that will 
be needed for use on the Pilot Test program. Methods are 
clear, but code must be developed. 

3.3.2 B. Map Regression Relationships 

A good deal of development work is still needed, for 
both the details of the methods and for the programming 
of the meth.ds. This will require some added thought and 
development during Phase II of the project. 

3.3.3 C. Landsat Mapping Extension 

This also needs development. The proposed method 
for use on Phase II of the Pilot Test has not been used 
before. The programs must be developed and applied in 
the course of the Phase II implementation. 

3.3.4 D. Photointerpretive Keys 

The procedure for PI Key development is well known, 
but its application to the specific environment of South 
Carolina has not been made in the context of multi resource 
survey. Most of the tools required for the application 
of these keys to Landsat data in South Carolina will be 
provided by the results of Phase I of the project, but 
there will still be elements to be looked at further 
during Phase II implementation. 
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4.0 RESEARCH AND DEVELOPMENT CONSIDERATIONS 


This section presents a partial list of research topics that could 
provide a significant contribution to the effectiveness of the MAIS. 

They are in no particular order of priority, because it is difficult to 
determine in advance just how valuable they will be if successfully 
carried out and incorporated into the MAIS. 

These activities are not appropriate for inclusion in the test 
itself, since there is no way of guaranteeing that results useful to 
test performance can be derived in time. Note also that no allowance 
for the cost of these items is given in Volume IV, nor is any calendar 
time devoted to them. Task 19 of Volume IV serves a different function. 

It is intended exclusively to provide the effort needed to monitor and 
remain aware of on-going activities of possible use to Pilot Test activities. 

4 . 1 Computerized Classification Algorithms 

Supervised and unsupervised algorithms for multi spectral data 
have been applied predominantly in agricultural experiments, and 
have proved most successful in dealing with large, regular, single 
crop areas, with gentle terrain. They have been much less successful 
in dealing with agriculture where small fields are common, and 
least successful in dealing with cases where there is considerable 
variation in altitude and aspect within a scene. 

Useful accuracy figures for computerized multispectral classifi- 
cation methods are hard to come by. There appears to be no specific 
experience that will be directly useful in telling how effective 
the classification algorithms will be in the South Carolina Pilot 



This is certainly a case where application of some research 
and development effort is worthwhile, to provide some quantitative 
measures of accuracy, and to show the best way to combine Landsat 
data and collateral data for that purpose. In particular, the 
sensitivity of classification accuracies to misregistration of 
multiple coverage needs to be looked at in some systematic way for 
application to forested scenes. 

4.2 Photogrammetric Reduction of Aerial Photography 

This has been done for many years with conventional resource 
photography. The high resolution color I/R photography obtained 
from the optical bar camera, with its wide angle and significant 
variation in aspect angle across the image, is another matter. 

There is little or no experience that shows how effective this 
source will be in the multiresource mapping and inventory problem. 

Development effort for assessment of the use of optical bar 
data would appear to be well worthwhile. 

4.3 Regression Relationships Between Landsat Data, Aerial 

Photography, and Ground Samples 

The use of regression relations of this type is not new {see 
Volume I), but the way in which it is proposed to apply the method 
to the Pilot Test does appear to be novel and so far untried. A 
key factor will be the correlation potential of Landsat with the 
interpreted aerial photography. Unless that correlation is reasonable, 
the method for using Landsat in either the resource estimation or 
the mapping mode is of doubtful value. 



Although this component has not been previously evaluated, it 
is difficult to see how any independent evaluation outside the 
bounds of the Pilot Test would be of much use in contributing to 
the Pilot Test. Therefore, although this is an untried area, the 
recommendation here is to apply the technique in the Pilot Test and 
monitor closely the results. Although this is certainly a research 
area, the research will be conducted in performance of this project. 

4 . 4 The Use of Landsat Data in a Mapping Mode 

The use of Landsat in the resource inventory mode can be 
thought of as 5 form of data averaging, since in that case the 
objective is aggregate figures for different resources. The mapping 
mode, however, makes much greater demands on the data. In this 
mode, extrapolation from areas of known resources to areas of 
unknown resources is attempted via the use of Landsat. This process 
is more sensitive than an averaging procedure, and is more likely 
to lead to errors of both omission and commission. Additional 
research on the proposed method is definitely needed. 

4.5 Determination of Resource Variables - Current Annual Increment 

This variable is not estimated by any of the conventional 

techniques of remote sensing. However, by regarding timber as a 
crop which responds to the same environmental variables as any 
other crop, there is a potential for modeling the "yield" of timber 
year by year in just the same way as crop yields can be estimated. 
This approach is the subject of the document "MAIS - Multi -Resource 
Analysis and Information System Research and Development Component 
Requirements Discussion for Dynamic Factors" (Earl S. Merritt, 


April 1980) where several dynamic aspects of the forest environment 
and resources are discussed. These include: 

* Mean annual increment 

* Erosion 

* Forest fire prevention and management 

* Range stocking 

* Watershed factors 

a Disease and insect vector propagation. 

The use of such modeling methods needs to be looked at in much 
more detail before the potential can be evaluated. It is a suitable 
subject for research and development in connection with the MAIS. 

If successful, it would require the addition of certain dynamic 
components to the presently conceived MAIS. 

4.6 Change Detection, Identification and Measurement Using Landsat 

Data 

Experiments in the Pacific Northwest show that change detection, 
particularly for clear-cut areas, is very feasible using Landsat as 
a data source. The minimum size of areas that can be monitored in 
this way, however, is of the order of 10 acres (though others have 
reported limited success at change detection of as little as a 
single Landsat pixel). Change identification is more difficult, 
and more work is needed on it. Measurement is relatively straight- 
forward if the changed categories are not subject to confusion, but 
here also there is need for research work to determine the practical 
limits of what can be done with Landsat. No experience base exists 


- 15 - 


for change detection, identification and measurement using Landsat 
in the South Carolina environment. 

4.7 Multiple Resource Keys 

The use of PI Keys for forestry using remotely sensed data is 
well established. Less well-established is the development of 
multiple resource keys, where several resources may be contained in 
a single location, and where each resource may call for a completely 
separate resource map (for example, the geographic species distribution 
of the understory may be quite different from the species distribution 
of the overstory). Work on these multiple resource keys is going 
on in Asheville, and additional efforts to provide good keys in the 
Southeast may be appropriate as part of the Pilot Test associated 


research efforts. 


APPENDIX B 


Photo Interpretation (PI) Classes and Codes for 
Mul tiresource Methods Pilot Test, Phase 1A 


Class Name 


EarthSat PI Code 


Forest 

1 . 

2 . 

3. 


1 = sappling size and smaller 

2 = pole size 

3 - sawtimber 

j is an index for tree density (% crown cover). 


1 = 

0 - 

5 % 

2 = 

6 - 

15% 

3 = 

16 - 

35% 

4 = 

36 - 

65% 

5 = 

66 - 

100% 


k is a background component, an explanation 
for the remaining ground cover, 

1 = pine understory 

2 = hardwood understory 

3 = mixed pine-hardwood understory 

4 - grass or herbaceous understory 


Conifer FI i j k 
Hardwood F2 i j k 
Mixed F3 i j k 

Where: i is an index of size of the dominant trees. 


Agriculture 


1 . 

Idle farmland 

A1 

2. 

Irrigated cropland 

A2 

3. 

Non-irrigated cropland 

A3 

4. 

Other farmland 

A4 

Wetland 

1 . 

Permanent, trees 

SI 

2. 

Permanent, other cover 

S2 

3. 

Intermittent, trees 

S3 

4. 

Intermittent, other cover 

S4 

Water 

1 . 

Flowing, census 

W1 

2. 

Flowing, non-census 

W2 

3. 

Contained, census 

W3 

4. 

Contained, non-census 

W4 

Grass 

1 . 

Natural rangeland 

G1 

2, 

Improved pasture 

G2 

Urban (developed or industrial) 

U 

Disturbed 


1 . 

Regenerative 

Dl 

2. 

Non-generative 

D2 

Brush 


B 

Rock 


R 

Other 


T 
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